p>This role is ideal for someone who enjoys working directly in Azure, improving production systems, troubleshooting issues across infrastructure and application layers, and building practical monitoring and alerting solutions that help teams respond faster and operate more confidently. Wellfit is the dental industryās fintech solution, breaking down financial barriers so patients, providers, employers, and payors can all access better care.
Mansfield, TX30+ days ago
A key aspect of this position is to establish a real time 360-degree view of the customer's experience in order to proactively monitor and support our key customer accounts, resolve quality problems in a timely manner, drive continual improvement, improve quality scorecards, and minimize cost of poor quality by fulfilling customer requirements, being responsive, and building customer relationships. TE Connectivity's Customer Quality Engineer (CQE) manages assigned strategic customer accounts quality and is responsible for ensuring that TE provides an exceptional customer experience for the Data and Devices (DND) business unit within TE.
Toyota is proud to have 10+ different Business Partnering Groups across 100 different North American chapter locations that support team members' efforts to dream, do and grow without questioning that they belong. You must have the right to work in the United States and not require Toyota support or sponsorship for immigration-related employment (e.g., H-1B, O-1, E-3, H-1B1, TN, F-1 OPT, F-1 STEM OPT, F-1 CPT, TN, 'job flexibility benefits' (also known as I-140 or Adjustment of Status portability), etc.
p/>As part of our journey from traditional operations toward a mature SRE model, the Senior SRE will partner with product engineering, platform teams, and the Command Center including Service Desk and Major Incident Command (MIC) to deliver measurable improvements in service reliability.
Deep knowledge of:
Azure: AKS, App Services, Functions, VMSS, Storage, Front Door, API Management, Load Balancers, Monitor, Log Analytics, App Insights, Key Vault, Policy, Defender.
p>As the Reliability Engineer you will be responsible for leading technical reliability initiatives, defining statistical controls, and delivering actionable improvements to product and process reliability. A detailāoriented engineer with strong analytical skills, expertise in reliability methodology, and the ability to collaborate across disciplines to drive measurable improvements in system availability and performance.
Toyota is proud to have 10+ different Business Partnering Groups across 100 different North American chapter locations that support team membersā efforts to dream, do and grow without questioning that they belong. Write and maintain scripts and automation workflows to reduce manual toil and streamline operational tasks (e.g., provisioning, configuration management, log rotation, disk cleanup, service restarts).
p>You'll combine hands-on Azure experience with code-level debugging, observability best practices, and automation to prevent issues before they occur, drive down MTTD/MTTR, and deliver an exceptional experience for patients and providers. - Make an Impact: Your work will directly shape the financial backbone of one of the most innovative healthcare fintech companies in the U.S.
- Work Flexibly: Hybrid model based in Dallas with 3 days/week in-office.
As a Lead Site Reliability Engineer at JPMorgan Chase within the Corporate sector, Enterprise technology team, you hold a leadership role in your team, demonstrate strong knowledge across multiple technical domains, and advise others on the technical and business issues facing them. Take lead and conduct resiliency design reviews, break up complex problems into digestible work for other engineers, act as a technical lead for medium to large-sized products, and provide advice and mentoring to other engineers.
As a Site Reliability Engineer III at JPMorgan Chase within the Consumer and Community Banking, you will serve as an experienced member of an agile team, focusing on designing and delivering trusted, market-leading technology products that are secure, stable, and scalable. Experience in observability such as white and black box monitoring, service level objective alerting, and telemetry collection using tools such as Grafana, Dynatrace, Prometheus, Datadog, Splunk, and others.
Replace the first sentence with \"As a Senior Lead Site Reliability Engineer at JPMorgan Chase within Consumer and Community banking team, you will set clear quality gates across requirements, design, secure coding, testing, releases, and post-production monitoring to ensure reliability, performance, security, and observability. Lead and participate in major incident response (including outside business hours when needed), run post-incident reviews, and drive improvements against KPIs like availability, MTTR, and change failure rate.
li>Built, using, and automating monitoring systems such as NewRelic, DataDog, SignalFX, Kibana, Hands-on experience deploying, operating, and monitoring production-grade AI/ML microservices (e.g., RAG pipelines, agentic systems) on cloud platforms like AWS Fargate/ECS. Hands-on experience building and operating distributed systems in a public cloud environment (preferably AWS), using CI/CD to deploy, manage and operate production systems, focusing on tooling and automation using tools such as maven and Jenkins.
p>The Technology & Operational Risk department within the Multifamily (MF) division is seeking a Site Reliability Engineer (SRE) who will blend software engineering with IT operations to ensure the reliability, availability, scalability, in the performance of key systems, services, and environments. Qualifications:
Proven expertise in designing, developing, and maintaining automation frameworks for application operations, including infrastructure provisioning, deployment pipelines, monitoring, and incident response, using tools such as Ansible, Terraform, Jenkins, and related technologies.
NTT DATA recruiters will never ask for payment or banking information and will only use @nttdata.com, @nttdatafed.com and @talent.nttdataservices.com email addresses. If you would like to contact us regarding the accessibility of our website or need assistance completing the application process, please contact us at https://us.nttdata.com/en/contact-us.
GEICO's Cyber Security Engineering & Analytics, Automation (SEA) team is seeking a Staff Cyber Site Reliability Engineer (SRE) - a hands-on, engineering-minded practitioner who is passionate about building reliable, observable, and scalable systems at the intersection of security and infrastructure. Partner with Developers and Infrastructure Engineers: Work closely with software engineers and infrastructure teams to review system designs for reliability, provide feedback on deployability and operability, and ensure that what gets built can be confidently operated and maintained in production.
p>Support reliability test methods including thermal cycling, thermal shock, high-temperature exposure, humidity, corrosion, pressure cycling, leak testing, coolant compatibility, mechanical fatigue, and bond/interface reliability. Lead and manage reliability test planning and execution for semiconductor packaging, liquid cold plates, T800 Thermadite, CVD diamond, thermal spreaders, embedded cooling structures, and related thermal assemblies.
p>The ideal candidate has deep experience operating production systems at scale, an automation-first mindset, and the ability to improve reliability through engineering practices such as SLOs/SLIs, production readiness, incident management, observability, resilience testing, and toil reduction. As part of a new SRE team supporting Autodesk GovCloud, you will have a unique opportunity to help shape how Autodesk deploys, runs, and improves production services in restricted cloud environments.
p>Our holistic approach to decisioning is powered by our industry-leading platform and team of experts, who help leaders make better decisions, faster - unlocking business growth and creating powerful customer connections. With clients in 50+ countries and global offices across New York City, Miami, Dallas, Dublin, London, Paris, Singapore, Shanghai, Munich, Poznan, Sydney, Melbourne, Charlottesville and Denver, we're growing fast.
Toyota is proud to have 10+ different Business Partnering Groups across 100 different North American chapter locations that support team membersā efforts to dream, do and grow without questioning that they belong. An important part of the Toyota family is Toyota Financial Services (TFS), the finance and insurance brand for Toyota and Lexus in North America.
Required Qualifications: 5+ years of hands-on experience in SRE, DevOps, or Infrastructure Engineering roles Strong production experience in AWS Required: Significant hands-on experience with Terraform in real-world environments Experience operating monitoring and uptime platforms such as Grafana, Pingdom, and Uptrends Strong Linux systems, networking, and troubleshooting skills Experience supporting production systems through incident response and on-call rotations Proficiency with GitHub and modern Git workflows Experience building or maintaining CI/CD pipelines with Azure DevOps Familiarity with ITSM and incident workflows using ServiceNow Strong written communication skills with experience documenting systems and processes in Confluence Ability to work independently in a remote or hybrid environment. Preferred Qualifications: Experience defining and operating against SLOs and error budgets Infrastructure-as-Code best practices beyond Terraform (modules, testing, CI integration) Experience with containers and orchestration (Docker, Kubernetes) Experience supporting large-scale, high-availability production systems Prior experience mentoring engineers or serving as a technical lead.
p>With offices in Toronto, San Francisco, Dallas, and Pittsburgh, Waabi is growing quickly and looking for diverse, innovative and collaborative candidates who want to impact the world in a positive way. Experience with established methodologies (e.g., 5 Whys, Fishbone diagrams, Fault Tree Analysis, or FMEA) to support the responsibility of conducting thorough root cause analyses on recurring issues.
p>We are hiring a Site Reliability Engineer (SRE) with Oracle DBA expertise to join our client's Data Center Engineering team. This role focuses on ensuring uptime, scalability, and resilience of missionācritical Oracle database infrastructure.
ul>Design, develop, and execute performance testing strategies for distributed systems and microservices, including load testing, stress testing, soak testing, and capacity planning. Design, develop, and execute performance testing strategies for distributed systems and microservices, including load testing, stress testing, soak testing, and capacity planning.
The ideal candidate will leverage Azure-native AI services and agentic systems to reduce toil, improve incident response, and enable intelligent operations-while also driving performance testing practices to validate system resilience under load. - Design, develop, and execute performance testing strategies for distributed systems and microservices, including load testing, stress testing, soak testing, and capacity planning.
Arlington, TX30+ days ago
Work closely with data scientists, data architects, data engineers, ETL developers, cybersecurity, network, Linux, other IT counterparts, and business partners to design and setup the environments to manage the ingested and processed datasets from the external sources, internal systems, and the data warehouse to extract features of interest. Solid experience in High Availability and distributed systems, Linux , Data and SAN Storage Networks, NAS and Networking, leveraging tools to instrument and automate proactively and eventually predictive availability solutions.
li>Protocol Expertise: Mastery of DNS-specific protocols including DNSSEC, DoT, and DoH, with a firm grasp of underlying transport layers (UDP/TCP) and dual-stack (IPv4/IPv6) networking. You will combine deep IP networking and DNS expertise with modern security protocols to ensure our platforms remain resilient against evolving threats and perform at the highest level for millions of users.
NTT DATA recruiters will never ask for payment or banking information and will only use @nttdata.com, @nttdatafed.com and @talent.nttdataservices.com email addresses. If you would like to contact us regarding the accessibility of our website or need assistance completing the application process, please contact us at https://us.nttdata.com/en/contact-us.
p>Founded in 2009, we continue to be recognized for our intentional culture and tremendous growth (Best Place to Work in Fintech; Best & Brightest to Work For Nationally; and Comparably's Best Company Culture, Best Career Growth, Best Engineering Team, and Best Places to Work in Dallas, among others). The Sr Site Reliability Engineer, Release will prototype, write, maintain, and test code in multiple stages of the release process and in multiple environments in order to rapidly deliver automated solutions to our application releases.