We are looking for an experienced, self-motivated, highly productive Site Reliability Engineer (SRE) to build and scale services in a cloud environment within our Infrastructure team.
- Building, deploying, improving, and maintaining infrastructure (in on-premise, Azure)
- Managing operations and tooling around compute infrastructure
- Building/optimizing monitoring and alerting (maintaining customer SLAs, RTO/RPO target requirements)
- Managing operations on additional infrastructure components such as monitoring, alerting and databases
- Build tools and automations
- Be on call, respond to incidents and conduct root-cause analysis on customer-impacting issues
- Define and manage SLO, SLI and error budgets
- Leading new projects and initiatives around site reliability, developing or finding net-new solutions, evaluating products, and leading discussions on technology topics
- Mentoring team members, showing thought-leadership, and helping educate the team on best practices
- Remote desktop/laptop assistance may be required from time to time
- Customer interaction may be required from time to time
- Other duties as assigned
Candidate CriteriaRequired Experience
- Either a B.S. degree or equivalent in Computer Science or a minimum of 7 years’ experience in Infrastructure-as-code, deployment systems and have experience writing automation in a modern programming language
- Experience with monitoring, metrics, logs
- Cloud computing (Azure)
- Understanding of distributed systems and their commonly associated problems
- Experience with CI/CD systems (Preferred)
- Experience writing infrastructure as a code (Terraform, Ansible, Puppet, etc.) (Preferred)
- Experience working with containers and Kubernetes (Preferred)
- Experience utilizing enterprise system monitoring tools (i.e. PRTG, Elasticsearch, etc) (Preferred)
Critical Skills & Qualifications
- Strong networking fundamentals
- Belief in automating the problems
- Strong communication & analytical skills.
- Curiosity, adaptability, and a willingness to learn.
- Experience with managing measurable goals and metrics.
- Previous experience in remote work settings preferred.
Paid days off: 10/y
50% overlap with ET time zone mandatory
- Prototype (Manufacturing)
- Process Control
- Electrical Engineering