Icon hamburger
What job do you want?
Apply to this job.
Think you're the perfect candidate?
Apply Now

You’re being taken to an external site to apply.

Enter your email below to receive job recommendations for similar positions.

Site Reliability Engineer

Hunter Technical Resources Atlanta Full-Time
Apply Now

Site Reliability Engineer

  • Engage in and improve the whole lifecycle of software development services— from inception and design, through deployment, operation, and refinement. 
  • Support services before they go live through activities such as system design consulting, developing software platforms and frameworks, capacity planning and launch reviews. 
  • Work closely with development and operations teams to build highly available, cost effective systems with extremely high  uptime metrics. 
  • Work with teams across organization and ensures core services reliability and keep an eye on capacity and performance. 
  • Maintain services once they are live by measuring and monitoring availability, latency, and overall system health in a 24x7 environment. 
  • Participate in 24x7X365 an on-call support for multiple core platforms globally. Using a “ Follow the Sun” model, we expect working patterns will include on call duty, weekend and holiday season cover. 
  • Scale systems sustainably through mechanisms like automation and evolve systems by pushing for changes that improve reliability and velocity. 
  • Practice sustainable incident response and blameless postmortems. 
  • Influence and create new designs, architecture, standards, and methods for large-scale systems. 
  • Binding and orchestrating the system infrastructure with the application layer to enable High Availability/Clustering load balancing and integration; 
  • Provide technical guidance or support for the development or troubleshooting of systems; 
  • Responsible for establishing end-to-end monitoring and alerting on all critical aspects to ensure SLOs, SLIs, and SLAs and get proactive notifications of possible issues for all systems; 
  • Develop automated solutions to address potential problems before they result in a service interruption and demonstrate a passion for automation, including CI/CD automation; 
  • Establish performance baseline, capacity thresholds, correlate events, and define monitoring/alerting criteria. 

  • Bachelors of Science degree in Computer Science, Engineering, or equivalent relevant experience. 
  • Expertise in designing, analyzing and troubleshooting large-scale distributed systems. 
  • Systematic problem-solving approach, coupled with strong communication skills and a sense of ownership and drive; 
  • Ability to debug and optimize code and automate routine tasks; 
  • Overall 6+ years of experience in one or more of the following:
    • Experience in building JavaEE applications using, build tools like Maven/ANT, Subversion, JIRA Jenkins, Bitbucket and Chef; 
    • Experience in continuous integration tools (Jenkins, SonarQube, JIRA, Nexus, Confluence, GIT-BitBucket, Maven, Gradle, RunDeck, is a plus); 
    • You' ve created automation using Chef, Puppet or another SCM tool; Docker and container scheduler services such as ECS or Kubernetes is desirable; 
    • You' ve worked with Nginx, Tomcat, HAProxy, Redis, Elastic Search, MongoDB, and RabbitMQ, Kafka, Zookeeper; 
    • Experience as SCM/release engineer, or in a position with similar skill sets and responsibilities (Software Engineer, Systems Engineer, Systems Administrator); 
    • Experience in performing source code control management Subversion/GIT including branching, merging, tagging, etc.; 
    • Experience in configuring and administering JavaEE application servers (Tomcat, WebSphere, WebLogic, etc.); 
    • Experience in with scripting language such as Unix Shells, Python, Perl, Shell, bash, ksh); 
    • Experience in configuring, building, and supporting apps and operations in a public cloud environment (AWS, Azure, GCP); 
    • Experience with Monitoring and Logging tools (Elastic Search, ELK, AppDynamics, Splunk, etc.); 
    • Collaborate well with team members, developers, QA, and ownership teams to resolve issues; 
    • Knowledge of Agile / Scrum methodologies and principles; 
    • Possess excellent written and verbal communication skills with the ability to communicate with team members at various levels, including business leaders; 
    • A real passion for and the ability to learn new technologies.

Recommended skills

Build Tools
Apache Zookeeper


CareerBuilder Estimated Salary

Based on Job Title, Location and Skills
Below Avg. Average Above Avg.

Career Path

See the next step in your career
Site Reliability Engineer
Estimated Salary: $100K
Apply to this job.
Think you're the perfect candidate?
Apply Now

Help us improve CareerBuilder by providing feedback about this job: Report this job

Report this Job

Once a job has been reported, we will investigate it further. If you require a response, submit your question or concern to our Trust and Site Security Team

Job ID: 4738641


For your privacy and protection, when applying to a job online, never give your social security number to a prospective employer, provide credit card or bank account information, or perform any sort of monetary transaction. Learn more.

By applying to a job using CareerBuilder you are agreeing to comply with and be subject to the CareerBuilder Terms and Conditions for use of our website. To use our website, you must agree with the Terms and Conditions and both meet and comply with their provisions.