Cloud Site Reliability Engineer

Stefanini International Holdings Ltd

Dallas, TX

JOB DETAILS
SKILLS
Agile Programming Methodologies, Amazon Web Services (AWS), Automation, Best Practices, Cloud Computing, Code Reviews, Continuous Deployment/Delivery, Continuous Integration, Cross-Functional, Documentation, Identify Issues, Incident Management, On Call, Pattern Analysis, Reliability Engineering, Resource Management, Scalable System Development, Software Engineering, Source Code/Configuration Management (SCM), Standards Development, Technical Support, Test Automation, Test Driven Development (TDD)
LOCATION
Dallas, TX
POSTED
24 days ago

Details:

Stefanini Group is hiring!

Stefanini is looking for Cloud Site Reliability Engineer in Dallas, TX (Hybrid)

For quick apply, please contact Sudhanshu.Shrivastava; Ph: 248 582 6510

Sudhanshu.shrivastava@stefanini.com

W2 Candidates only!

About the Opportunity:

As a Senior Cloud Engineer in the Cloud SRE team, you will be responsible for designing and developing cloud solutions and engineering reliability tools for the Cloud Foundation Services (CFS) platform in the Infrastructure, Platforms & Operations organization. You will apply software engineering practices to build scalable, reusable solutions and utilities that enhance platform reliability.

Responsibilities:

What Will Be Expected of You:

  • Design, develop, and maintain reliability solutions and SRE utilities to reduce toil, improve cloud platform reliability, and industrialize SRE practices across the system
  • Build and optimize Infrastructure as Code (IaC) using Terraform to manage AWS resources related to SRE solutions, incorporating cost-efficient design principles
  • Develop CI/CD pipelines and automated testing to ensure code quality, reliability, and rapid delivery of the solutions
  • Define SRE standards, best practices, and guidelines for adoption across teams; establish SRE metrics like SLI, SLOs, etc.
  • Apply software engineering best practices including version control, code reviews, test-driven development, and documentation to all development
  • Participate in incident management and on-call rotation, providing technical support for SRE tools, troubleshooting production issues, and collaborating with teams to reduce incident recurrence through proactive detection and pattern analysis
  • Stay current with emerging AWS services, SRE methodologies, and cloud-native development technologies, and drive adoption of innovative solutions
  • Collaborate within Agile and Scaled Agile frameworks with cross-functional teams to deliver integrated cloud automation solutions
  • Produce clear, blameless postmortems with actionable items and documented failure scenarios

#LI-SS3

#LI-HYBRID

About the Company

S

Stefanini International Holdings Ltd