Senior Site Reliability Engineer - USA Remote
Dynasticx LLC
New York City, NY(remote)
Apply
JOB DETAILS
SALARY
$50–$55 Per Year
JOB TYPE
Temporary, Contractor, Full-time
SKILLS
AWS Lambda, Amazon Elastic Compute Cloud (EC2), Amazon Simple Storage Service (S3), Amazon Web Services (AWS), Analysis Skills, Ansible, Automation, Bash Scripting, Budget Management, Budgeting, Communication Skills, Configuration Management, Consulting, Continuous Deployment/Delivery, Continuous Improvement, Continuous Integration, Cross-Functional, DNS (Domain Name System), DevOps, Documentation, GitHub, Go Programming Language (Golang), HIPAA (Health Insurance Portability and Accountability Act), ISO (International Organization for Standardization), Identify Issues, Incident Management, Incident Response, Jenkins, Load Balancing, Machine Tool, Maintain Compliance, Network Configuration Management, On Call, Operations Planning, Performance Analysis, Python Programming/Scripting Language, Regulatory Compliance, Reliability Engineering, Root Cause Analysis, Scripting (Scripting Languages), Software as a Service (SaaS), System Operations, Team Player, Test Tools, Time Management, Windows PowerShell
LOCATION
New York City, NY
POSTED
10 days ago
Position – Senior Site Reliability Engineer
Experience:10-12+ years
Role :- Contract to Hire
Location: Remote
JOB DESCRIPTION
Reliability & Performance
• Design and implement monitoring, alerting, and reliability tooling using CloudWatch, Grafana, Prometheus, Datadog, or ELK.
• Analyze production performance, capacity, and error budgets to maintain agreed SLIs/SLOs.
• Implement automated health checks, scaling rules, and self-recovery mechanisms to minimize manual intervention.
• Drive root cause analysis (RCA) and post-incident reviews, ensuring permanent fixes and documentation.
Automation & Operations
• Build automation for deployment, configuration, and infrastructure management using Terraform, Ansible, or CloudFormation.
• Develop and maintain CI/CD pipelines with GitHub Actions, GitLab CI, or Jenkins.
• Manage and optimize containerized and serverless workloads (Kubernetes, ECS, EKS, Lambda).
• Implement automated rollbacks, blue/green deployments, and canary releases.
Incident Response & On-Call
• Participate in 24/7 on-call rotation for critical systems and lead incident management for your domain.
• Reduce mean time to detection (MTTD) and mean time to recovery (MTTR) through proactive automation and observability.
• Develop runbooks and operational playbooks for global SRE teams.
Security & Compliance
• Embed security practices into automation and deployment processes.
• Ensure systems adhere to ISO 27001 and SOC 2 requirements through continuous compliance monitoring.
• Manage IAM policies, secrets, and network configurations securely and efficiently.
Collaboration & Continuous Improvement
• Partner with developers to design for operability, scalability, and resilience from day one.
• Contribute to cross-team reliability reviews and platform improvement initiatives.
• Champion DevOps and reliability culture across Amtech’s engineering organization.
QUALIFICATIONS
• 6+ years of experience in Site Reliability, DevOps, or Infrastructure Engineering roles.
• Strong background in AWS (EC2, ECS/EKS, RDS, Lambda, S3, IAM, VPC).
• Proficiency with Infrastructure-as-Code and automation (Terraform, Ansible, CloudFormation).
• Experience with observability tools (Prometheus, Grafana, CloudWatch, ELK, or Datadog).
• Scripting and automation skills (Python, Bash, Go, or PowerShell).
• Solid understanding of networking, DNS, and load balancing.
• Strong troubleshooting, incident management, and root cause analysis skills.
• Excellent communication and collaboration abilities in a cross-functional, distributed environment.
PREFERRED QUALIFICATIONS
• Certifications such as AWS Certified SysOps Administrator, SRE Foundation, or CKA.
• Experience with chaos engineering or resilience testing tools.
• Familiarity with SLO/SLI error budget management.
• Exposure to multi-region, multi-account, or hybrid architectures.
• Background supporting SaaS platforms or regulated environments (SOC 2, HIPAA, GDPR).
About the Company
D