Senior Site Reliability Engineer

Okta Inc

San Francisco, CA

JOB DETAILS
SKILLS
Amazon Web Services (AWS), Analysis Skills, Apache Cassandra, Application Integration, Artificial Intelligence (AI), Authentication, Autism, Autoimmune Disease, Automation, Automation Systems, Best Practices, Budget Management, Budgeting, Business Growth, Business Operations, COPD (Chronic Obstructive Pulmonary Disease), Cancer, Capacity Management, Cardiovascular Disease, Cloud Computing, Communication Skills, Continuous Deployment/Delivery, Continuous Improvement, Continuous Integration, Customer Relations, DNS (Domain Name System), Developmental Disabilities, Diabetes, Disease, Emerging Technology, Equal Employment Opportunity (EEO), GCP (Good Clinical Practices), Git, Go Programming Language (Golang), Government Contracts, HIV/AIDS (Acquired Immune Deficiency Syndrome), Hearing Impairment, High Availability, Home Automation, Identify Issues, Improvement Metrics, Incident Management, Incident Response, Infrastructure as a Service (IaaS), Leadership, Learning Disabilities, Legal, Load Balancing, Machine Tool, Mentoring, Microservices, Multiple Sclerosis, MySQL, Network Administration/Management, Neurotrauma (Traumatic Brain Injury), OFCCP (Office of Federal Contract Compliance Programs), On Call, Onboarding, Operational Improvement, Operational Strategy, Operational Support, Organizational Skills, Philosophy, Post Traumatic Stress Disorder (PTSD), PostgreSQL, Process Improvement, Product Design, Product Documentation, Production Control, Production Support, Production Systems, Productivity Management, Psychiatry and Mental Health, Pulmonary Disease, Python Programming/Scripting Language, Redis, Reliability Engineering, Reporting Dashboards, SSL-TLS (Secure Socket Layer - Transport Layer Security), Scalable System Development, Security Compliance, Slack, Software Development, Software Engineering, Software as a Service (SaaS), Splunk, Systems Reliability, Talent Management, Team Player, Technical Leadership, Technical/Engineering Design, Telemetry, Training/Teaching, United States Department of Labor (DOL), United States Military, Wound Care
LOCATION
San Francisco, CA
POSTED
2 days ago

Senior Site Reliability Engineer

San Francisco, California

Secure Every Identity, from AI to HumanIdentity is the key to unlocking the potential of AI. Okta secures AI by building the trusted, neutral infrastructure that enables organizations to safely embrace this new era. This work requires a relentless drive to solve complex challenges with real-world stakes. We are looking for builders and owners who operate with speed and urgency and execute with excellence.This is an opportunity to do career-defining work. Were all in on this mission. If you are too, lets talk.Get to know Okta Okta is The World's Identity Company. We free everyone to safely use any technology-anywhere, on any device or app. Our Workforce and Customer Identity Clouds enable secure yet flexible access, authentication, and automation that transforms how people move through the digital world, putting Identity at the heart of business security and growth.

At Okta, we celebrate a variety of perspectives and experiences. We are not looking for someone who checks every single box, we're looking for lifelong learners and people who can make us better with their unique experiences.

Join our team! We're building a world where Identity belongs to you.

The Engineering Opportunity We are looking for an experienced Senior Site Reliability Engineer to join Oktas Emerging Products Group (EPG). Our mission is to build highly reliable, scalable, and secure cloud services that our customers can trust. We embrace an automation-first mindset and continuously invest in platform engineering, observability, and operational excellence to enable our engineering teams to move quickly and safely. This role is ideal for an experienced Site Reliability Engineer who enjoys solving complex technical challenges at scale, building automation, and improving the reliability of production systems. You will serve as a key contributor within the EPG SRE organization, partnering closely with software engineers, architects, and product teams to design, build, and operate world-class cloud services. The ideal candidate exemplifies the philosophy of "if you have to do it more than once, automate it" and possesses a strong passion for continuous improvement, operational excellence, and software engineering.

What Youll Be Doing Reliability & Operations Design, build, and operate large-scale cloud infrastructure and production services. Participate in an on-call rotation supporting highly available customer-facing systems. Lead incident response efforts and drive post-incident reviews focused on systemic improvements. Define, measure, and improve Service Level Indicators (SLIs), Service Level Objectives (SLOs), and error budgets. Partner with engineering teams to improve service availability, scalability, performance, and resilience. Continuously improve observability through metrics, logging, tracing, dashboards, and alerting. Engineering & Automation Develop software, automation, and infrastructure using Go, Python, Terraform, and related technologies. Eliminate operational toil through automation, tooling, and platform engineering. Improve deployment safety and operational workflows through CI/CD and GitOps practices. Collaborate on modernizing existing workloads and aligning them with evolving platform capabilities. Build self-service platforms, operational guardrails, and automation that improve developer velocity while maintaining reliability and security. Technical Leadership Contribute to and drive reliability initiatives within the product group. Guide engineers in adopting operational best practices and reliability engineering principles. Mentor engineers through technical collaboration, design reviews, incident analysis, and knowledge sharing. Support architecture and operational decisions through data-driven recommendations and engineering expertise. Execute projects from conception through production rollout and long-term operational ownership. Innovation Explore and apply AI-assisted engineering techniques to improve operational efficiency, incident response, troubleshooting, and automation. Identify opportunities to leverage emerging technologies to reduce toil and improve engineering productivity. Our Tech Stack Infrastructure/Orchestration: Kubernetes (EKS/GKE), Terraform, Helm, Git, ArgoCD, GitOps Programming: Golang, Python Observability: Datadog, Splunk Data Stores: PostgreSQL, Redis, OpenSearch

What We Are Looking For Technical Excellence Strong experience operating large-scale production services in AWS and/or GCP. Deep expertise with Kubernetes in production environments. Experience troubleshooting Kubernetes networking, storage, scheduling, scaling, and workload lifecycle issues. Extensive experience with Infrastructure as Code technologies such as Terraform and Helm. Strong software engineering skills in Golang and/or Python. Experience building automation and internal engineering platforms. Experience operating and troubleshooting distributed data platforms such as PostgreSQL, Redis, OpenSearch, MySQL, Cassandra, or similar technologies. Strong understanding of cloud networking fundamentals including DNS, load balancing, ingress, TLS, service networking, and traffic management. Experience with observability platforms, monitoring strategies, and production telemetry. Experience with or strong interest in AI-assisted engineering and operational automation. Operational Excellence Strong expertise operating customer-facing production systems. Experience leading incident response and driving operational improvements. Deep understanding of reliability engineering concepts including SLIs, SLOs, error budgets, and capacity planning. Strong understanding of CI/CD pipelines, deployment strategies, and automation-first operational practices. Proven ability to balance reliability, scalability, security, and engineering velocity.

Security & Compliance Understanding of cloud security fundamentals, IAM, secrets management, and secure infrastructure design. Experience implementing operational controls and best practices in regulated or security-sensitive environments is a plus. Leadership Demonstrated experience contributing to complex engineering initiatives. Strong collaboration and communication skills. Experience working effectively within globally distributed engineering organizations spanning multiple timezones and cultures. Experience mentoring engineers and elevating technical capabilities within an organization. Ability to collaborate on technical direction through expertise, partnership, and execution. Preferred Qualifications Experience operating SaaS platforms serving large-scale customer workloads. Experience working within Kubernetes-based microservices environments. Experience supporting globally distributed production environments. Experience with GitOps and ArgoCD. Experience implementing AI-assisted operational tooling or automation workflows. #LI-Hybrid#P22403The Okta Experience Supporting Your Well-Being Driving Social Impact Developing Talent and Fostering Connection + Community We are intentional about connection. Our global community, spanning over 20 offices worldwide, is united by a drive to innovate. Your journey begins with an immersive, in-person onboarding experience designed to accelerate your impact and connect you to our mission and team from day one.Okta is an Equal Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, ancestry, marital status, age, physical or mental disability, or status as a protected veteran. We also consider for employment qualified applicants with arrest and convictions records, consistent with applicable laws.If reasonable accommodation is needed to complete any part of the job application, interview process, or onboarding please use this Form to request an accommodation.Notice for New York City Applicants & Employees: Okta may use Automated Employment Decision Tools (AEDT), as defined by New York City Local Law 144, that use artificial intelligence, machine learning, or other automated processes to assist in our recruitment and hiring process. In accordance with NYC Local Law 144, if you are an applicant or employee residing in New York City, please click here to view our full NYC AEDT Notice.

U.S. Equal Opportunity Employment Information

Read more

Individuals seeking employment at this company are considered without regards to race, color, religion, national origin, age, sex, marital status, ancestry, physical or mental disability, veteran status, gender identity, or sexual orientation. When submitting your application above, you are being given the opportunity to provide information about your race/ethnicity, gender, and veteran status.

Completion of the form is entirely voluntary. Whatever your decision, it will not be considered in the hiring process or thereafter. Any information that you do provide will be recorded and maintained in a confidential file.

If you believe you belong to any of the categories of protected veterans listed below, please indicate by making the appropriate selection. As a government contractor subject to Vietnam Era Veterans Readjustment Assistance Act (VEVRAA), we request this information in order to measure the effectiveness of the outreach and positive recruitment efforts we undertake pursuant to VEVRAA. Classification of protected categories is as follows:

A "disabled veteran" is one of the following: a veteran of the U.S. military, ground, naval or air service who is entitled to compensation (or who but for the receipt of military retired pay would be entitled to compensation) under laws administered by the Secretary of Veterans Affairs; or a person who was discharged or released from active duty because of a service-connected disability.

A "recently separated veteran" means any veteran during the three-year period beginning on the date of such veterans discharge or release from active duty in the U.S. military, ground, naval, or air service.

An "active duty wartime or campaign badge veteran" means a veteran who served on active duty in the U.S. military, ground, naval or air service during a war, or in a campaign or expedition for which a campaign badge has been authorized under the laws administered by the Department of Defense.

An "Armed forces service medal veteran" means a veteran who, while serving on active duty in the U.S. military, ground, naval or air service, participated in a United States military operation for which an Armed Forces service medal was awarded pursuant to Executive Order 12985.

Pay Transparency

Okta complies with all applicable federal, state, and local pay transparency rules. For additional information about the federal requirements, click here.

Voluntary Self-Identification of Disability Form CC-305 Page 1 of 1 OMB Control Number 1250-0005 Expires 04/30/2026

Why are you being asked to complete this form?

We are a federal contractor or subcontractor. The law requires us to provide equal employment opportunity to qualified people with disabilities. We have a goal of having at least 7% of our workers as people with disabilities. The law says we must measure our progress towards this goal. To do this, we must ask applicants and employees if they have a disability or have ever had one. People can become disabled, so we need to ask this question at least every five years. Completing this form is voluntary, and we hope that you will choose to do so. Your answer is confidential. No one who makes hiring decisions will see it. Your decision to complete the form and your answer will not harm you in any way. If you want to learn more about the law or this form, visit the U.S. Department of Labors Office of Federal Contract Compliance Programs (OFCCP) website at www.dol.gov/ofccp.

Completing this form is voluntary, and we hope that you will choose to do so. Your answer is confidential. No one who makes hiring decisions will see it. Your decision to complete the form and your answer will not harm you in any way. If you want to learn more about the law or this form, visit the U.S. Department of Labor's Office of Federal Contract Compliance Programs (OFCCP) website at www.dol.gov/agencies/ofccp.

How do you know if you have a disability?

A disability is a condition that substantially limits one or more of your "major life activities." If you have or have ever had such a condition, you are a person with a disability. Disabilities include, but are not limited to:

Alcohol or other substance use disorder (not currently using drugs illegally) Autoimmune disorder, for example, lupus, fibromyalgia, rheumatoid arthritis, HIV/AIDS Blind or low vision Cancer (past or present) Cardiovascular or heart disease Celiac disease Cerebral palsy Deaf or serious difficulty hearing Diabetes Disfigurement, for example, disfigurement caused by burns, wounds, accidents, or congenital disorders Epilepsy or other seizure disorder Gastrointestinal disorders, for example, Crohns Disease, irritable bowel syndrome Intellectual or developmental disability Mental health conditions, for example, depression, bipolar disorder, anxiety disorder, schizophrenia, PTSD Missing limbs or partially missing limbs Mobility impairment, benefiting from the use of a wheelchair, scooter, walker, leg brace(s) and/or other supports Nervous system condition, for example, migraine headaches, Parkinson's disease, multiple sclerosis (MS) Neurodivergence, for example, attention-deficit/hyperactivity disorder (ADHD), autism spectrum disorder, dyslexia, dyspraxia, other learning disabilities Partial or complete paralysis (any cause) Pulmonary or respiratory conditions, for example, tuberculosis, asthma, emphysema Short stature (dwarfism) Traumatic brain injury

PUBLIC BURDEN STATEMENT: According to the Paperwork Reduction Act of 1995 no persons are required to respond to a collection of information unless such collection displays a valid OMB control number. This survey should take about 5 minutes to complete.

Okta

The foundation for secure connections between people and technology

Okta is the leading independent provider of identity for the enterprise. The Okta Identity Cloud enables organizations to securely connect the right people to the right technologies at the right time. With over 7,000 pre-built integrations to applications and infrastructure providers, Okta customers can easily and securely use the best technologies for their business. More than 19,300 organizations, including JetBlue, Nordstrom, Slack, T-Mobile, Takeda, Teach for America, and Twilio, trust Okta to help protect the identities of their workforces and customers.

Follow Okta

About the Company

O

Okta Inc