106 Results for

Reliability Engineer Jobs in Irving, TX

Jobs

Plano, TX30+ days ago

$70–$73.68 Per Hour

In terms of professional development, Apex hosts an on-demand training program, provides access to certification prep and a library of technical and leadership courses/books/seminars once you have 6+ months of tenure, and certification discounts and other perks to associations that include CompTIA and IIBA. Experience: A minimum of 5 years of experience is required in supporting enterprise solutions, including enterprise security, orchestration, workflow automation, CI/CD pipelines, and cloud platforms.

Plano, TX30+ days ago

$65–$70 Per Hour

In terms of professional development, Everforth Apex hosts an on-demand training program, provides access to certification prep and a library of technical and leadership courses/books/seminars once you have 6+ months of tenure, and certification discounts and other perks to associations that include CompTIA and IIBA. Everforth Apex also offers a HSA (Health Savings Account on the HDHP plan), a SupportLinc Employee Assistance Program (EAP) with up to 8 free counseling sessions, a corporate discount savings program and other discounts.

Plano, TX30+ days ago

$68–$73.68 Per Hour

TX30+ days ago

HF Sinclair owns and operates refineries located in Kansas, Oklahoma, New Mexico, Wyoming, Washington and Utah and markets its refined products principally in the Southwest U.S., the Rocky Mountains extending into the Pacific Northwest and in other neighboring Plains states.

HF Sinclair Corporation, headquartered in Dallas, Texas, is an independent energy company that produces and markets high-value light products such as gasoline, diesel fuel, jet fuel, renewable diesel and other specialty products.

Irving, TX26 days ago

$60–$65 Per Hour

In terms of professional development, Everforth Apex hosts an on-demand training program, provides access to certification prep and a library of technical and leadership courses/books/seminars once you have 6+ months of tenure, and certification discounts and other perks to associations that include CompTIA and IIBA. Review and analyze complex multi-faceted, larger scale or longer-term Systems Operations Engineering challenges that require in-depth evaluation of multiple factors including intangibles or unprecedented factors.

Westlake, TX19 days ago

$65–$71 Per Hour

We are seeking a highly skilled Site Reliability Engineer (SRE) with strong expertise in Terraform, Cloud Infrastructure, DevOps, Automation and Load Balancing, The ideal candidate will be responsible for ensuring the reliability, scalability, performance, and availability of critical enterprise applications across hybrid and multi-cloud environments. Required Skills & Qualifications

5+ years of experience in Site Reliability Engineering, DevOps Engineering, Platform Engineering, or related disciplines (understanding reliability engineering principles, SLIs, SLOs, error budgets, and operational excellence).

Arlington, TX30+ days ago

The role will also be supportive of overall Cloud Transformation initiatives designed to meet key goals in creating a service-driven culture through performance and delivery of SaaS, PaaS, and IaaS solutions by public cloud vendors such as Azure and AWS.

Assist with defining and implementing basic monitoring coverage aligned to Golden Signals (e.g., latency, traffic, errors, saturation/capacity) and validate telemetry appears correctly in monitoring platforms.

Dallas, TX30+ days ago

The platforms we offer include central logging, monitoring, agents and alerting and we provide tools to drive adoption and improvements to capacity planning, operational readiness assessments, production incident postmortems, SLIs / SLOs, and deployment automation including canary releases.

Experience: Minimum of 6+ years of hands-on experience in Site Reliability Engineering, with a proven track record in architecting, designing, building, and maintaining highly available, scalable, and fault-tolerant systems at an enterprise level.

Richardson, TX30+ days ago

Experience: Minimum of 6+ years of hands-on experience in Site Reliability Engineering, with a proven track record in architecting, designing, building, and maintaining highly available, scalable, and fault-tolerant systems at an enterprise level.

New!

Dallas, TX1 day ago

The Senior Principal Reliability Engineer will be expected to have deep knowledge and experience with a variety of reliability engineering sub-disciplines including Failure Modes Effects Criticality Analysis (FMECA), Environmental Stress Screening (ESS) process optimization, Reliability Predications, Derating Analysis and overseeing Failure Reporting and Corrective Action (FRACAS) practitioners. More information about Security Clearances can be found on the US Department of State government website here:

Tucson, AZ:

As part of our commitment to maintaining a secure hiring process, candidates may be asked to attend select steps of the interview process in-person at one of our office locations, regardless of whether the role is designated as on-site, hybrid or remote.The salary range for this role is 132,400 USD - 251,600 USD.

TX30+ days ago

$104,000–$166,000 Per Year

The AWS Site Reliability Engineer (SRE) will collaborate closely with cross-functional teams, including development, quality assurance, and operations, to ensure seamless software releases and continuous improvement of our release processes.

What you will do:

Infrastructure Automation: Design, implement, and manage infrastructure as code (IaC) solutions using tools like AWS CloudFormation, Terraform or Helm Charts to automate continuous database deployment and scaling processes.

Dallas, TX30+ days ago

$81,456–$137,490 Per Year

p>In this role, you will contribute to environmental and stress testing efforts, support failure analysis investigations, and help analyze test and field data to identify potential reliability risks. You'll work closely with design, manufacturing, and supplier teams to help implement design-for-reliability best practices and assist with reliability verification activities from concept through production.

TX21 days ago

p>We are seeking a highly skilled Site Reliability Engineer (SRE) with strong expertise in Terraform, Cloud Infrastructure, DevOps, Automation and Load Balancing, The ideal candidate will be responsible for ensuring the reliability, scalability, performance, and availability of critical enterprise applications across hybrid and multi-cloud environments.

5+ years of experience in Site Reliability Engineering, DevOps Engineering, Platform Engineering, or related disciplines (understanding reliability engineering principles, SLIs, SLOs, error budgets, and operational excellence).

Plano, TX30+ days ago

Full-time

As a Lead Site Reliability Engineer at JPMorgan Chase within the Infrastructure & Production Management sector of Consumer & Community Banking, you hold a leadership role in your team, demonstrate strong knowledge across multiple technical domains, and advise others on the technical and business issues facing them. Take lead and conduct resiliency design reviews, break up complex problems into digestible work for other engineers, act as a technical lead for medium to large-sized products, and provide advice and mentoring to other engineers.

Dallas, MN30+ days ago

We are seeking a highly skilled Site Reliability Engineer (SRE) to support and enhance the reliability, scalability, and performance of enterprise applications and infrastructure. The ideal candidate will have strong experience in cloud environments, automation, monitoring, and production support.

Southlake, TX11 days ago

li>5+ years of experience leading the implementation and scaling of reliability engineering practices such as service level objectives, monitoring strategies, incident reviews, and automation-driven improvements. You will lead efforts to elevate production operations through modern Site Reliability Engineering practices, shaping how engineering teams design, build, and operate resilient systems at scale.

Dallas, TX26 days ago

This is also a hands-on technologist role requiring exposure to SRE and DevOps technology stacks and strong understanding of application support processes, including monitoring and addressing incidents/alerts across engineering applications and ensuring effective coordination and handoffs with vendors, partners and internal Synchrony teams. Role Summary/Purpose: The Reliability Engineer - OnePay plays a pivotal technical role within Synchrony Financial to ensure high availability of our applications to enhance and maintain customer experiences for OnePay integrations while providing operational excellence and adherence to program SLAs.

New!

Plano, TX4 days ago

Write and maintain scripts and automation workflows to reduce manual toil and streamline operational tasks (e.g., provisioning, configuration management, log rotation, disk cleanup, service restarts).

New!

Plano, TX4 days ago

p>As a Lead Site Reliability Engineer at JPMorgan Chase within the Infrastructure Platforms, Web Hosting team , you hold a leadership role in your team, demonstrate strong knowledge across multiple technical domains, and advise others on the technical and business issue facing them.

Leads reuse-first adoption of AI-assisted reliability workflows across SDLC/toolchain practices (e.g., CI/CD quality checks, test/validation automation, and operational readiness), ensuring traceability/auditability, resiliency, and security controls.

Dallas, TX17 days ago

p>As a Senior Lead Site Reliability Engineer at JPMorgan Chase within the Chief Data & Analytics Office (CDAO) AI/ML & Data Platforms team, you work with your fellow stakeholders to define non-functional requirements (NFRs) and availability targets for services supporting large-scale data platforms and data lake ecosystems. You will ensure those NFRs are embedded into product design and testing phases, that service level indicators effectively measure customer and data platform performance, and that service level objectives are defined with stakeholders and implemented in production to support secure, scalable, and high-performing analytics and AI/ML workloads.

New!

Dallas, TX6 days ago

$60–$65

p> Determining compensation for this role (and others) at Vaco by Highspring depends upon a wide array of factors including but not limited to:

the individual’s skill sets, experience and training;
licensure and certification requirements;
office location and other geographic considerations;
other business and organizational needs. Determining compensation for this role (and others) at Vaco/Highspring depends upon a wide array of factors including but not limited to the individual’s skill sets, experience and training, licensure and certifications, office location and other geographic considerations, as well as other business and organizational needs.

TX21 days ago

5+ years of experience in Site Reliability Engineering, DevOps Engineering, Platform Engineering, or related disciplines (understanding reliability engineering principles, SLIs, SLOs, error budgets, and operational excellence).

Dallas, TX24 days ago

Participate in incident management and on-call rotation, providing technical support for SRE tools, troubleshooting production issues, and collaborating with teams to reduce incident recurrence through proactive detection and pattern analysis. Build and optimize Infrastructure as Code (IaC) using Terraform to manage AWS resources related to SRE solutions, incorporating cost-efficient design principles.

Irving, Texas30+ days ago

p>This role is ideal for someone who enjoys working directly in Azure, improving production systems, troubleshooting issues across infrastructure and application layers, and building practical monitoring and alerting solutions that help teams respond faster and operate more confidently.

Wellfit is the dental industry’s fintech solution, breaking down financial barriers so patients, providers, employers, and payors can all access better care.

Mansfield, TX30+ days ago

A key aspect of this position is to establish a real time 360-degree view of the customer's experience in order to proactively monitor and support our key customer accounts, resolve quality problems in a timely manner, drive continual improvement, improve quality scorecards, and minimize cost of poor quality by fulfilling customer requirements, being responsive, and building customer relationships.

TE Connectivity's Customer Quality Engineer (CQE) manages assigned strategic customer accounts quality and is responsible for ensuring that TE provides an exceptional customer experience for the Data and Devices (DND) business unit within TE.

New!

Plano, TX4 days ago

Toyota is proud to have 10+ different Business Partnering Groups across 100 different North American chapter locations that support team members' efforts to dream, do and grow without questioning that they belong. You must have the right to work in the United States and not require Toyota support or sponsorship for immigration-related employment (e.g., H-1B, O-1, E-3, H-1B1, TN, F-1 OPT, F-1 STEM OPT, F-1 CPT, TN, 'job flexibility benefits' (also known as I-140 or Adjustment of Status portability), etc.

Dallas, Texas15 days ago

p/>

As part of our journey from traditional operations toward a mature SRE model, the Senior SRE will partner with product engineering, platform teams, and the Command Center including Service Desk and Major Incident Command (MIC) to deliver measurable improvements in service reliability.

Deep knowledge of:

Azure: AKS, App Services, Functions, VMSS, Storage, Front Door, API Management, Load Balancers, Monitor, Log Analytics, App Insights, Key Vault, Policy, Defender.

New!

Fort Worth, TX4 days ago

p>As the Reliability Engineer you will be responsible for leading technical reliability initiatives, defining statistical controls, and delivering actionable improvements to product and process reliability.

A detail‑oriented engineer with strong analytical skills, expertise in reliability methodology, and the ability to collaborate across disciplines to drive measurable improvements in system availability and performance.

New!

Plano, Texas5 days ago

Toyota is proud to have 10+ different Business Partnering Groups across 100 different North American chapter locations that support team members’ efforts to dream, do and grow without questioning that they belong.

Irving, TX30+ days ago

$130,000–$150,000 Per Year

p>You'll combine hands-on Azure experience with code-level debugging, observability best practices, and automation to prevent issues before they occur, drive down MTTD/MTTR, and deliver an exceptional experience for patients and providers.

Make an Impact: Your work will directly shape the financial backbone of one of the most innovative healthcare fintech companies in the U.S.
Work Flexibly: Hybrid model based in Dallas with 3 days/week in-office.

Plano, TX30+ days ago

As a Lead Site Reliability Engineer at JPMorgan Chase within the Corporate sector, Enterprise technology team, you hold a leadership role in your team, demonstrate strong knowledge across multiple technical domains, and advise others on the technical and business issues facing them. Take lead and conduct resiliency design reviews, break up complex problems into digestible work for other engineers, act as a technical lead for medium to large-sized products, and provide advice and mentoring to other engineers.

Plano, TX30+ days ago

Full-time

As a Site Reliability Engineer III at JPMorgan Chase within the Consumer and Community Banking, you will serve as an experienced member of an agile team, focusing on designing and delivering trusted, market-leading technology products that are secure, stable, and scalable. Experience in observability such as white and black box monitoring, service level objective alerting, and telemetry collection using tools such as Grafana, Dynatrace, Prometheus, Datadog, Splunk, and others.

Plano, TX30+ days ago

Full-time

Replace the first sentence with \"As a Senior Lead Site Reliability Engineer at JPMorgan Chase within Consumer and Community banking team, you will set clear quality gates across requirements, design, secure coding, testing, releases, and post-production monitoring to ensure reliability, performance, security, and observability. Lead and participate in major incident response (including outside business hours when needed), run post-incident reviews, and drive improvements against KPIs like availability, MTTR, and change failure rate.

Dallas, TX30+ days ago

li>Built, using, and automating monitoring systems such as NewRelic, DataDog, SignalFX, Kibana,

Hands-on experience deploying, operating, and monitoring production-grade AI/ML microservices (e.g., RAG pipelines, agentic systems) on cloud platforms like AWS Fargate/ECS.

Hands-on experience building and operating distributed systems in a public cloud environment (preferably AWS), using CI/CD to deploy, manage and operate production systems, focusing on tooling and automation using tools such as maven and Jenkins.

Dallas, TX13 days ago

$145,000–$217,000 Per Year

p>The Technology & Operational Risk department within the Multifamily (MF) division is seeking a Site Reliability Engineer (SRE) who will blend software engineering with IT operations to ensure the reliability, availability, scalability, in the performance of key systems, services, and environments.

Qualifications:

Proven expertise in designing, developing, and maintaining automation frameworks for application operations, including infrastructure provisioning, deployment pipelines, monitoring, and incident response, using tools such as Ansible, Terraform, Jenkins, and related technologies.

New!

Plano, TX4 days ago

$96,800–$145,200 Per Year

NTT DATA recruiters will never ask for payment or banking information and will only use @nttdata.com, @nttdatafed.com and @talent.nttdataservices.com email addresses. If you would like to contact us regarding the accessibility of our website or need assistance completing the application process, please contact us at https://us.nttdata.com/en/contact-us.

Dallas, TX27 days ago

$110,000–$230,000 Per Year

GEICO's Cyber Security Engineering & Analytics, Automation (SEA) team is seeking a Staff Cyber Site Reliability Engineer (SRE) - a hands-on, engineering-minded practitioner who is passionate about building reliable, observable, and scalable systems at the intersection of security and infrastructure. Partner with Developers and Infrastructure Engineers: Work closely with software engineers and infrastructure teams to review system designs for reliability, provide feedback on deployability and operability, and ensure that what gets built can be confidently operated and maintained in production.

TX26 days ago

p>Support reliability test methods including thermal cycling, thermal shock, high-temperature exposure, humidity, corrosion, pressure cycling, leak testing, coolant compatibility, mechanical fatigue, and bond/interface reliability.

Lead and manage reliability test planning and execution for semiconductor packaging, liquid cold plates, T800 Thermadite, CVD diamond, thermal spreaders, embedded cooling structures, and related thermal assemblies.

Plano, TX11 days ago

$117,000–$209,330 Per Year

p>The ideal candidate has deep experience operating production systems at scale, an automation-first mindset, and the ability to improve reliability through engineering practices such as SLOs/SLIs, production readiness, incident management, observability, resilience testing, and toil reduction.

As part of a new SRE team supporting Autodesk GovCloud, you will have a unique opportunity to help shape how Autodesk deploys, runs, and improves production services in restricted cloud environments.

Dallas, TX30+ days ago

p>Our holistic approach to decisioning is powered by our industry-leading platform and team of experts, who help leaders make better decisions, faster - unlocking business growth and creating powerful customer connections.

With clients in 50+ countries and global offices across New York City, Miami, Dallas, Dublin, London, Paris, Singapore, Shanghai, Munich, Poznan, Sydney, Melbourne, Charlottesville and Denver, we're growing fast.

New!

Plano, Texas5 days ago

An important part of the Toyota family is Toyota Financial Services (TFS), the finance and insurance brand for Toyota and Lexus in North America.

Allen, TX30+ days ago

Remote

$104,900–$174,700 Per Year

Required Qualifications: 5+ years of hands-on experience in SRE, DevOps, or Infrastructure Engineering roles Strong production experience in AWS Required: Significant hands-on experience with Terraform in real-world environments Experience operating monitoring and uptime platforms such as Grafana, Pingdom, and Uptrends Strong Linux systems, networking, and troubleshooting skills Experience supporting production systems through incident response and on-call rotations Proficiency with GitHub and modern Git workflows Experience building or maintaining CI/CD pipelines with Azure DevOps Familiarity with ITSM and incident workflows using ServiceNow Strong written communication skills with experience documenting systems and processes in Confluence Ability to work independently in a remote or hybrid environment. Preferred Qualifications: Experience defining and operating against SLOs and error budgets Infrastructure-as-Code best practices beyond Terraform (modules, testing, CI integration) Experience with containers and orchestration (Docker, Kubernetes) Experience supporting large-scale, high-availability production systems Prior experience mentoring engineers or serving as a technical lead.

Dallas, TX30+ days ago

p>With offices in Toronto, San Francisco, Dallas, and Pittsburgh, Waabi is growing quickly and looking for diverse, innovative and collaborative candidates who want to impact the world in a positive way.

Experience with established methodologies (e.g., 5 Whys, Fishbone diagrams, Fault Tree Analysis, or FMEA) to support the responsibility of conducting thorough root cause analyses on recurring issues.

Plano, TX30+ days ago

p>We are hiring a Site Reliability Engineer (SRE) with Oracle DBA expertise to join our client's Data Center Engineering team. This role focuses on ensuring uptime, scalability, and resilience of mission‑critical Oracle database infrastructure.

Plano, Texas12 days ago

Temporary

ul>

Design, develop, and execute performance testing strategies for distributed systems and microservices, including load testing, stress testing, soak testing, and capacity planning.

Plano, TX11 days ago

The ideal candidate will leverage Azure-native AI services and agentic systems to reduce toil, improve incident response, and enable intelligent operations-while also driving performance testing practices to validate system resilience under load.

Design, develop, and execute performance testing strategies for distributed systems and microservices, including load testing, stress testing, soak testing, and capacity planning.

Arlington, TX30+ days ago

Work closely with data scientists, data architects, data engineers, ETL developers, cybersecurity, network, Linux, other IT counterparts, and business partners to design and setup the environments to manage the ingested and processed datasets from the external sources, internal systems, and the data warehouse to extract features of interest. Solid experience in High Availability and distributed systems, Linux , Data and SAN Storage Networks, NAS and Networking, leveraging tools to instrument and automate proactively and eventually predictive availability solutions.

New!

TX2 days ago

$83,538–$137,241 Per Year

li>Protocol Expertise: Mastery of DNS-specific protocols including DNSSEC, DoT, and DoH, with a firm grasp of underlying transport layers (UDP/TCP) and dual-stack (IPv4/IPv6) networking. You will combine deep IP networking and DNS expertise with modern security protocols to ensure our platforms remain resilient against evolving threats and perform at the highest level for millions of users.

New!

Plano, TX4 days ago

$96,800–$145,200 Per Year

New!

TX3 days ago

Remote

$110,000–$137,485 Per Year

p>Founded in 2009, we continue to be recognized for our intentional culture and tremendous growth (Best Place to Work in Fintech; Best & Brightest to Work For Nationally; and Comparably's Best Company Culture, Best Career Growth, Best Engineering Team, and Best Places to Work in Dallas, among others).

The Sr Site Reliability Engineer, Release will prototype, write, maintain, and test code in multiple stages of the release process and in multiple environments in order to rapidly deliver automated solutions to our application releases.

Stand out to leading employers.

Upload your resume and let employers find you for new Reliability Engineer job openings. Plus, receive relevant job matches delivered straight to your inbox.

Create A Free Account

Reliability Engineer Jobs in Irving, TX

Site Reliability Engineer III StratAcuity Staffing Partners Inc

Site Reliability Engineer (SRE) StratAcuity Staffing Partners Inc

Senior Site Reliability Engineer (SRE) StratAcuity Staffing Partners Inc

Rotating Equipment Engineer - Midstream HF Sinclair Corp

Senior Site Reliability Engineer (SRE) - NC, TX StratAcuity Staffing Partners Inc

Site Reliability Engineer NTT DATA

Site Reliability Engineer I General Motors Financial Company, Inc.

Engineering - SRE Platforms - Site Reliability Engineer - Vice President - Dallas The Goldman Sachs Group Inc

Asset & Wealth Management - Site Reliability Engineer - Vice President - Richardson The Goldman Sachs Group Inc

Senior Principal Reliability Engineer Raytheon

Senior AWS Cloud Site Reliability Engineer (SRE) with AWS Database experience Peraton Inc

Hardware Reliability Engineer II (R4675) Shield AI Inc

Site Reliability Engineer NTT DATA Group Corp

Lead Site Reliability Engineer JPMorgan Chase Bank, N.A.

Site Reliability Engineer eTeam Inc.

Senior Site Reliability Engineer The Charles Schwab Corp

Reliability Engineer - OnePay Synchrony Financial

Site Reliability Engineer - Platforms Toyota Motor Corp

Lead Site Reliability Engineer JPMorgan Chase & Co

Senior Lead Site Reliability Engineer - AI/ML and Data Platforms JPMorgan Chase & Co

Site Reliability Engineer III Vaco LLC

Site Reliability Engineer NTT DATA Services, LLC

Cloud Site Reliability Engineer Stefanini International Holdings Ltd

Platform Reliability Engineer, Azure Wellfit Technologies

SR QLTY & RELIABILITY ENGINEER TE Connectivity plc

Senior Site Reliability Engineer - Database Services Toyota Motor Corp

Senior Site Reliability Engineer Las Vegas Sands

Reliability Engineer - Level 4 Lockheed Martin Corp

Site Reliability Engineer - Platforms TCC Toyota Motor Credit Corporation Company

Site Reliability Engineer, Azure Wellfit Technologies Inc

Lead Site Reliability Engineer (GTAM) JPMorgan Chase & Co

Site Reliability Engineer III JPMorgan Chase Bank, N.A.

Senior Lead Site Reliability Engineer JPMorgan Chase Bank, N.A.

Senior Site Reliability Engineer Navan Inc

Site Reliability Engineer Tech Lead Federal Home Loan Mortgage Corp

Site Reliability Engineer (Onsite Hybrid) NTT DATA Group Corp

Staff Cyber Site Reliability Engineer (SRE) GEICO GENERAL INSURANCE COMPANY

Reliability Engineer - Advanced Thermal Management Coherent Corp

Senior Site Reliability Engineer Autodesk Inc

Site Reliability Engineer Analytic Partners Inc

Senior Site Reliability Engineer - Database Services TCC Toyota Motor Credit Corporation Company

Senior Site Reliability Engineer II RELX Group plc

Vehicle Reliability Engineer Waabi

Site Reliability Engineer – Oracle DB Glint Tech Solutions LLC

Site Reliability Engineer, AI & Agentic Systems ServiceLink

Site Reliability Engineer, AI & Agentic Systems ServiceLink IP Holding Co LLC

Lead Site Reliability Engineer General Motors Financial Company, Inc.

Site Reliability Engineer, DNS Optimum Communications Inc

Site Reliability Engineer (Onsite Hybrid) NTT DATA Services, LLC

Sr Site Reliability Engineer - Release Alkami Technology Inc

Similar Job Searches