My job alerts

Senior Site Reliability Engineer

Optimizely

Software Engineering

United States

Posted 6+ months ago

Optimizely is focused on unlocking digital potential and we are the recognized category leader in Digital Experience Platform (DXP) and created the category for A/B Testing and experimentation software. We have incredible customers – isn’t that one of the most important aspects of looking for your next job? Optimizely has over 9,000 brands from global organizations such as Visa, Sky, Yamaha, Wall Street Journal to tech innovators like Atlassian, DocuSign, FitBit and Zillow.

Not only are we financially sound and growing but we have unicorn status: Exceeded $300M in revenue in 2020, is profitable already, and has all strategic options ahead of itself. Optimizely continues to invest and addresses a market opportunity north of $30 billion, providing significant personal career growth opportunities.

We are an inclusive culture with a global team of 1200+ people across the US, Europe, Australia, and Vietnam. We blend European and American business culture with emphasis on teamwork, inclusion, and moving fast. People make the difference!

If you are looking to work on the next generation of digital technologies in a fast-paced, hyper-growth environment, apply! We’re just getting started...

Introduction

SREs at Optimizely are focused on making us the most reliable, performant, and trustworthy Digital Experience Optimization platform ever. Our engineering teams have built data pipelines that process 10 billion events daily and applications that support powerful experimentation and collaboration workflows at scale. Our platforms are built on AWS and GCP. We use technologies such as Kafka, Samza, HBase, MySQL, and Postgres. We build and manage our systems using TravisCI, Jenkins, Docker, Kubernetes, Terraform, and Chef. We use a combination of managed and self-hosted approaches. This is a unique opportunity to lead the engineering organization in areas of standardized automated infrastructure and service provisioning and orchestration, service-oriented architectural excellence, and forward-looking planning and execution of large technical projects.

Job Responsibilities

Define a roadmap for all engineering teams to utilize fully automated, self-service, highly scalable, cost-efficient, observable, auditable and reliable infrastructure services as standard practice
Drive the execution of this roadmap across the engineering organization, collaborating with SREs and senior engineers across engineering while also performing hands-on work on the most critical challenges
Provide expert technical guidance and ongoing engineering design review to teams planning and implementing large migrations, service-oriented architecture, broad architectural shifts, and capacity growth
Build a metrics-driven operational culture standardizing our practices for SLO definition and review as well as for logging, monitoring, alerting, and on-call practices
Make iterative improvements to blameless incident management processes, root cause analyses, outage prevention, and service recovery strategies across the engineering organization
Partner closely with Security, Quality, and Product teams to achieve high priority security, privacy, compliance, reliability, and business-continuity objectives on our overall roadmap
Propose and drive large improvements to production systems to achieve a significant impact to our business and engineering teams
Mentor and coach engineers to be curious and effective at discovering and solving technical challenges

Knowledge and Experience

You have proven experience (6+ years) demonstrating hands-on technical leadership and business impact in combining software engineering skills with systems engineering skills to solve complex automation and reliability challenges
You have deep technical experience with various cloud providers, containerization technologies, automated deployment frameworks, orchestration frameworks, monitoring, logging, alerting, system internals, networking, databases, distributed systems, and service-oriented architecture
You have the skills to implement load, stress, performance, and reliability testing standards at scale to improve service, platform, and infrastructure resiliency
You promote openness, diversity of opinions, and inclusive discussions at all times to evaluate a wide variety of ideas and perspectives in solving challenging problems
You demonstrate clear decision making and good trade-offs in complex situations comprising multiple opinions, needs, teams, technologies, cloud providers, and architectural settings
You communicate effectively with stakeholders ranging from executives to junior engineers across the breadth and depth of the engineering organization
You exemplify high accountability, integrity, and resilience to maintain focus on both big-picture goals and milestones to get there
You enable the engineering organization to innovate and deliver with greater speed and safety

Education

BS CS or equivalent industry experience

Optimizely is committed to a diverse and inclusive workplace. Optimizely is an equal opportunity employer and does not discriminate on the basis of race, national origin, gender, gender identity, sexual orientation, protected veteran status, disability, age, or other legally protected status.

#LI-BR1

#LI-Remote

See more open positions at Optimizely

Companies you’ll love to work for

Senior Site Reliability Engineer

Introduction

Job Responsibilities

Knowledge and Experience

Education