Companies you’ll love to work for

Correlation Ventures

SRE Tooling & Observability Platform Expert



Barcelona, Spain
Posted on Tuesday, July 9, 2024

Job title: SRE Tooling & Observability Platform Expert

  • Spain / Barcelona

About the job

At Sanofi CHC, we’re committed to providing the next-gen healthcare that patients and customers need. It’s about harnessing data insights and leveraging AI responsibly to search deeper and solve sooner than ever before. Join our team as SRE Tooling & Observability Platform Expert and you can help make it happen. Your job? The SRE Tooling & Observability Platform Expert at CHC is a specialized role designed to enhance the reliability, scalability, and efficiency of our platforms through expert implementation and management of SRE tooling. This role focuses on logging and analyzing, alerting, monitoring, configuration and Infrastructure as Code (IaC), and incident management to ensure high availability and performance across all systems. The SRE Tooling Expert will work closely with the platform engineering and site reliability teams to develop and maintain a robust tooling ecosystem that supports CHC's operational and business goals.

Our team is engaged in designing and delivering Digital Technology Services and Platforms for all CHC worldwide. We deliver our solutions in a highly competitive market answering FMCH needs and sometimes supporting highly regulated environments (GxP, SoX and other regional and local regulations).

Our team operates in an international context, serving all markets in most of the countries around the world.

At Sanofi Consumer Healthcare, we build trusted and loved brands that connect with hundreds of millions of consumers worldwide. Our mission is to enable better self-care for individuals and communities, while also contributing to a healthier planet. We strive to act as a force for good by integrating sustainability along our business and employees’ mission and operate responsibly from both a social and environmental point of view. To achieve this, we need people who can shape the future of our business and help us on our journey to becoming the best fast-moving consumer healthcare company in and for the world.

Main responsibilities:

  • Develop and maintain a comprehensive suite of SRE tools for logging, analyzing, alerting, and monitoring to ensure system reliability and performance.

  • Implement and manage configuration and Infrastructure as Code (IaC) solutions to automate and streamline infrastructure provisioning and management processes.

  • Design and implement effective incident management strategies and tools to quickly identify, respond to, and resolve system issues.

  • Collaborate with engineering teams to integrate SRE tooling into the development and operational lifecycle, enhancing system observability and reliability.

  • Continuously evaluate and introduce improvements to the SRE tooling ecosystem, staying ahead of industry trends and best practices and provide expertise and guidance on the use of SRE tools, facilitating knowledge sharing and best practices among team members.

  • Participate in the planning and execution of system scalability and reliability initiatives, ensuring the infrastructure can support growing workloads and traffic.

  • Partner with Operations team, ensuring they have all the tools needed to be best in class.

About you


The ideal candidate for the SRE Tooling Expert position at CHC is someone who not only possesses a deep technical proficiency across a broad spectrum of SRE tooling but also demonstrates a proven track record of applying these skills in a dynamic environment. This individual will have extensive experience in logging, analyzing, alerting, monitoring, and incident management, showcasing their ability to ensure system reliability and performance. With expertise in configuration management and Infrastructure as Code (IaC) practices, the candidate will be adept at using tools such as Terraform, Ansible, or CloudFormation to automate infrastructure provisioning and management, ensuring scalable and efficient operations.

Soft skills:

  • Strong analytical and critical thinking skills, with the ability to develop creative solutions to complex problems. Strategic Thinking - ability to evaluate relevant areas of operation, formulate objectives and set priorities in a contextually relevant way, and develop plans consistent with long-term organizational interests.

  • Excellent communication skills, ensuring clear and effective technical information exchange among various stakeholders.

Technical skills:

  • Proficiency with logging tools such as ELK (Elasticsearch, Logstash, Kibana), Splunk, Dynatrace or Datadog. Ability to set up comprehensive logging mechanisms, analyze logs for insights, and troubleshoot issues based on log data.

  • Experience with alerting tools like Prometheus Alertmanager, Grafana, or PagerDuty. Skilled in configuring alerts based on specific metrics that indicate system health, performance issues, or failures.

  • Expertise in implementing monitoring solutions using tools such as Prometheus, Grafana, Nagios, or Zabbix. Ability to monitor system performance, resource usage, and operational health in real-time to ensure reliability and availability.

  • Strong background in using IaC tools like Terraform, Ansible, or CloudFormation for automating the provisioning and management of infrastructure. Understanding of configuration management principles and practices to maintain consistency and reliability across environments.

  • Knowledge of incident management processes and tools (such as JIRA Service Desk, ServiceNow, or Opsgenie) to efficiently respond to and resolve incidents. Experience in setting up incident response protocols and conducting post-mortem analyses to prevent future occurrences.

  • Familiarity with cloud services (AWS, Azure, Google Cloud Platform) and their respective management and monitoring tools. Understanding of how to leverage cloud capabilities for scalable and resilient infrastructure.

  • Proficiency in scripting languages (such as Python, Bash, or PowerShell) for automating routine tasks, integrating systems, and enhancing the capabilities of SRE tooling.

  • Awareness of security best practices and tools for monitoring security events, conducting vulnerability assessments, and ensuring data protection in line with compliance requirements.


  • A relevant degree in Computer Science, Information Technology, or related fields.

  • Certifications in relevant SRE tooling and methodologies are highly desirable.


  • Fluency in written and spoken English.

Pursue progress, discover extraordinary

Better is out there. Better medications, better outcomes, better science. But progress doesn’t happen without people – people from different backgrounds, in different locations, doing different roles, all united by one thing: a desire to make miracles happen. So, let’s be those people.

At Sanofi, we provide equal opportunities to all regardless of race, colour, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, ability or gender identity.

Watch our ALL IN video and check out our Diversity Equity and Inclusion actions at!