Principal Engineer - Platform Engineering & Production Support

Wells Fargo & Co

Irving, TX

JOB DETAILS
SKILLS
Automation, Cloud Applications, Cloud Computing, DevOps, Distributed Computing, Gap Analysis, Identify Issues, Incident Management, Incident Response, Mentoring, Microservices, Multitasking, On Call, Operational Support, Problem Solving Skills, Production Support, Production Systems, Reliability Engineering, Reporting Dashboards, Risk Analysis, Software Administration, Splunk, Systems Reliability, Technical Leadership
LOCATION
Irving, TX
POSTED
30+ days ago

Title: Principal Engineer Platform Engineering & Production Support

Location: 401 W Las Colinas Blvd Irving, TX

Alternate Locations: Charlotte, NC or Minneapolis, MN

Duration: 12 months

Work Engagement: W2

Work Schedule: 3 days in office/2 days remote

Benefits on offer for this contract position: Health Insurance, Life insurance, 401K and Voluntary Benefits

Summary:

We are seeking a Principal Engineer within the Platform Engineering team. This individual must be Day 1 ready, comfortable operating in fast-paced, production-critical environments, and capable of balancing multiple competing priorities.

The ideal candidate is a seasoned DevOps and Site Reliability Engineering (SRE) professional with strong hands-on expertise in observability, incident management, and cloud platforms (OpenShift). This role will play a leading part in supporting production systems, preventing outages, and improving system reliability through automation, intelligent monitoring, and modern SRE practices.

Team Overview:

This role supports a critical Platform Engineering team responsible for stabilizing, scaling, and operating applications as they move toward and beyond production release. The team plays a key role post-deployment, ensuring reliability, performance, and operational excellence across a broad application portfolio.

This is not traditional infrastructure support. It is application-focused production engineering, requiring deep technical expertise, proactive issue prevention, and strong ownership of application health in cloud-native environments.

Responsibilities:

  • Lead production support efforts across a portfolio of 20+ applications, ensuring stability, performance, and rapid issue resolution

  • Design, build, and maintain advanced monitoring, alerting, and observability dashboards using tools such as Splunk, Grafana, AppDynamics, Prometheus, and SPLOC

  • Proactively identify production risks through gap analysis, anomaly detection, and predictive alerting, preventing incidents before they occur

  • Troubleshoot complex production issues across distributed microservices environments, driving reduced MTTR through deep technical expertise

  • Drive adoption of modern SRE practices, including automation, AIOps, and intelligent monitoring

  • Support applications running on OpenShift and cloud-native platforms, with a strong focus on reliability, scalability, and resiliency

  • Collaborate closely with development teams during release cycles, providing production-readiness guidance and operational support

  • Participate in a 24x7 on-call rotation, demonstrating urgency, ownership, and accountability during incidents

  • Mentor and guide engineers, helping elevate team capabilities in SRE, DevOps, and platform engineering

  • Act as a trusted technical leader, able to rapidly shift priorities and manage competing demands in high-pressure environments

Qualifications:

  • Applicants must be authorized to work for ANY employer in the U.S. This position is not eligible for visa sponsorship.

  • Strong background in platform engineering and production support

  • Hands-on experience with:

  • Red Hat Linux

  • OpenShift and Kubernetes

  • Java and Python

  • Microservices architectures and Spring Boot

  • Experience designing and maintaining observability dashboards, including:

  • Grafana

  • Splunk

  • SPLOC

  • AppDynamics

  • Experience with observability alerts, incident response, and on-call support, leveraging tools such as:

  • AIOps platforms

  • ServiceNow

  • BigPanda or similar incident management tools

  • Experience with:

  • React.js

  • Apache

  • Kafka

  • Relational databases

  • Strong understanding of distributed systems, cloud-native platforms, and microservices-based architectures

About the Company

W

Wells Fargo & Co

We believe in our vision and values just as strongly today as we did the first time we put them on paper more than 20 years ago. Staying true to them will guide us toward continued growth and success for decades to come. As you read more about our vision and values, you will learn about who we are, where we’re headed and how every Wells Fargo team member can help us get there.

COMPANY SIZE
10,000 employees or more
INDUSTRY
Financial Services
FOUNDED
1852