Team Introduction The Site Reliability Engineering (SRE) team at TikTok combines software and systems engineering to build and run large-scale, massively distributed, and fault-tolerant systems. Our team is dedicated to ensuring that TikTok's core services remain stable, efficient, and resilient at a global scale. We focus on enhancing the observability and operability of our infrastructure, using data-driven insights to safeguard business stability 24/7.
Responsibilities As a Site Reliability Engineer, you will be responsible for the end-to-end reliability of our production ecosystem. You will balance traditional SRE functions-such as automation and performance tuning-with a specialized focus on disaster recovery and rapid incident response.
Preferred Qualifications: