Senior DevOps Engineer / Site Reliability Engineer

Thomas Talent Network

Raleigh, North Carolina

Apply

JOB DETAILS

SALARY

$140,000–$160,000 Per Year

SKILLS

Amazon Elastic Compute Cloud (EC2), Amazon Web Services (AWS), Analysis Skills, Ansible, Automation, B2B eCommerce, Best Practices, Capacity Management, Cloud Computing, Computer Science, Configuration Management, Continuous Deployment/Delivery, Continuous Integration, DevOps, Evangelism, GCP (Good Clinical Practices), Identify Issues, Incident Response, Industry/Trade Analysis, Jenkins, Linux Operating System, Load Balancing, Mandarin Chinese Language, Microsoft Windows Azure, Operations Management, Problem Solving Skills, Product Engineering, Product Support, Python Programming/Scripting Language, Reliability Engineering, Research & Development (R&D), Research Skills, Resource Management, Scripting (Scripting Languages), Software as a Service (SaaS), Systems Administration/Management, Systems Analysis, Systems Reliability, Team Player, Technical Research, Technical Support, United States Citizen, Unix Shell Programming, Willing to Travel, eCommerce

LOCATION

Raleigh, North Carolina

POSTED

7 days ago

A leading B2B SaaS platform in the cross-border e-commerce sector, is expanding its North America operations. We're seeking a Senior DevOps Engineer / Site Reliability Engineer (SRE) to architect and maintain our unified global O&M (operations and maintenance) platform.

This is a newly created role supporting our North America team's contribution. You'll work directly with our Middle Platform Director, Technical Experts, and CEO in a collaborative, remote-first environment, Can be located anywhere in the US.

KEY RESPONSIBILITIES:

• Design, develop, and maintain unified operation and platform management systems covering resource management, monitoring & alerting, configuration management, and automated operation & maintenance

• Build and operate observability platforms and CI/CD pipelines; develop self-healing systems and automated incident response processes to realize intelligent O&M

• Establish DevOps standards and best practices; promote standardization of DevOps toolchains (technology selection, version management)

• Provide platform-level technical support for product and engineering teams; resolve complex system issues, reduce technical debt, and lead infrastructure and architecture upgrades

• Promote SRE concepts and engineering practices; organize technical sharing and training; build a reliability engineering system

• Conduct technical research and innovation; track cloud-native/DevOps industry trends; evaluate new technologies and drive continuous modernization of O&M platforms

REQUIRED QUALIFICATIONS:

• Currently residing in California or North Carolina, USA

• US Green Card or US Citizenship (work authorization; no sponsorship available)

• Fluent in Mandarin Chinese (working language; close collaboration with domestic R&D required)

• Bachelor's degree or above in Computer Science or related field

• 4-6 years of hands-on experience in DevOps/SRE/Platform Engineering

• Proficient in at least one major cloud platform (AWS/Azure/GCP) with deep understanding of VPC, EC2, EKS/K8s, RDS, IAM

• Proficient in Linux, networking, containers (Docker/Kubernetes), load balancing, and service governance

• Skilled in IaC (Infrastructure as Code) tools: Terraform, Ansible, Helm

• Experience building CI/CD pipelines: Jenkins, Argo CD, CodeBuild, etc.

• Familiar with monitoring/logging/tracing: Prometheus, Grafana, ELK, OpenTelemetry

• Proficient in at least one development/scripting language: Python, Shell, Go

• Excellent system design, analysis, and troubleshooting skills

• Strong cross-team communication and collaboration abilities

PREFERRED QUALIFICATIONS:

• Master's degree in Computer Science or related field

• Experience with global platforms, cross-border SRE, multi-cloud O&M

• Led platform reconstruction, self-healing systems, or observability initiatives

• Go development, service mesh, chaos engineering, capacity planning experience

• Demonstrated success improving system availability, reducing incident rates, increasing automation

• Global technical vision and cross-cultural collaboration experience

• Result-oriented, self-driven, experienced in technical evangelism/sharing

COMPENSATION:

• Base Salary: $140,000 - $160,000 annually (top candidates may receive 5-10% upward adjustment)

• 401(k): Dollar-for-dollar match, up to 4% of salary

• Medical Insurance

• PTO: 12 days annually

• Social Security & Housing Fund: Contributed per US legal requirements

WORK ENVIRONMENT:

• Location: Silicon Valley, CA OR Raleigh, NC (homebase available)

• Department: Tech O&M Department

• Working Style: Remote-first

• Hours: 8 hours per day, weekends off

• Travel: No business travel required

• Expected Start: ASAP

Interview Process: Round 1 (Online): Middle Platform Director + Technical Expert | Round 2 (Online): Head of HR | Round 3 (Online): CEO/Founder

Senior DevOps Engineer / Site Reliability Engineer

Thomas Talent Network

Raleigh, North Carolina

About the Company

Thomas Talent Network