Create a Job Alert.

Enter your email below to save this search and receive job recommendations for similar positions.
Thank you. We'll send jobs matching these to
You already suscribed to this job alert.
No Thanks
US
What job do you want?

Site Reliability Software Engineer job in Alpharetta at Experis

Create Job Alert.

Get similar jobs sent to your email

Apply to this job.
Think you're the perfect candidate?
Apply Now

You’re being taken to an external site to apply.

Enter your email below to receive job recommendations for similar positions.
Site Reliability Software Engineer at Experis

Site Reliability Software Engineer

Experis Alpharetta, GA Contractor
Apply Now

Create Job Alert.

Get similar jobs sent to your email

Responsibilities:

  • Build holistic visibility into SLIs, SLOs, and SLAs, dependency graphs, past performance of software, network, and system to ensure that we can continue to scale without increasing operational burden or toil.
  • Assess the current state of the environment and drive "SWAT" initiatives in collaboration with the rest of the Organization to ensure transparency, resiliency, stability, reliability etc... Across both Applications & Infrastructure stack. SWAT initiatives for future state can vary from Incident Analysis leveraging ML & AI/ Assisting with Datacenter Stability & Consolidation effort to Application Transformation [Monolithic to Microservices, PaaS etc.]
  • Enables the adoption and implementation of cloud-based application reliability, resiliency, and observability /deployment best practices for production & non-prod environments including public cloud migration of our mission critical applications from the onprem data-centers.
  • Build infrastructure and drive projects that break things with the aim to improve the robustness of production systems.
  • Use the core Site Reliability Engineering principles of change management, monitoring, emergency response, capacity planning, and production readiness reviews to run the platform.
  • Step back to observe patterns and develop innovative tools and automation to minimize toil. Use those learnings to drive the best operational practices.
  • Monitor and report on service level objectives for a given applications services. Work with business and product owners to establish key performance indicators.
  • Partnering with security engineers and developing plans and automation to aggressively and safely respond to new risks and vulnerabilities.
  • Partner with the broader Fiserv organization to build a culture of rigorously learning from incidents.
  • Share your knowledge by giving brown bags, tech talks, and evangelizing appropriate tech and engineering best practices.
  • Unblock, support, and effectively communicate across teams to achieve results.
  • Define roadmap and architecture based on technology and business outcomes.

Experience:

  • 4+ years of software engineering experience and development best practices code management
  • Experience with Infrastructure as Code tools (e.g. Terraform, CloudFormation)
  • Experience with high level programming languages (Python, Go, Java, etc.)
  • Experience with designing solutions for Canary and/or Blue/Green deployments
  • Experience designing, debugging and running fault tolerant large-scale distributed systems
  • Experience working with public cloud platforms (e.g., AWS, Google Cloud Platform, Microsoft Azure, etc.)
  • Experience with creating and improving documented procedures and/or playbooks.
  • Knowledge of open-source configuration, orchestration, and CI/CD tools.
  • Knowledge of Kubernetes, PCF and/or Docker.
  • Deep understanding of Cloud Architecture and Operations
  • Strong troubleshooting and debugging skills
  • Experience with tools & technologies such as Prometheus, Grafana, AppDynamics, Dynatrace, Splunk and Moogsoft is a plus.
  • Experience handling large numbers of diverse systems with configuration management systems like: Puppet, Chef, Ansible, or Salt.
  • Understanding of standard networking protocols and components such as: HTTP, DNS, ECMP, TCP/IP, ICMP, the OSI Model, Subnetting and Load Balancing strategies.


Anu Insan

Sr. Technical Recruiter

Experis IT

100 Manpower Place | Milwaukee, WI 53212
[ Link removed ] - Click here to apply to Site Reliability Software Engineer

[ Link removed ] - Click here to apply to Site Reliability Software Engineer

 

Recommended Skills

Kubernetes
Splunk
Grafana
Docker
Terraform
Ansible
Apply to this job.
Think you're the perfect candidate?
Apply Now

Help us improve CareerBuilder by providing feedback about this job: Report this job

Report this Job

Once a job has been reported, we will investigate it further. If you require a response, submit your question or concern to our Trust and Site Security Team

Job ID: BBBH21826

CareerBuilder TIP

For your privacy and protection, when applying to a job online, never give your social security number to a prospective employer, provide credit card or bank account information, or perform any sort of monetary transaction. Learn more.

By applying to a job using CareerBuilder you are agreeing to comply with and be subject to the CareerBuilder Terms and Conditions for use of our website. To use our website, you must agree with the Terms and Conditions and both meet and comply with their provisions.