Sr. Infrastructure Site Reliability Engineer

The Charles Schwab Corp

Southlake, TX

JOB DETAILS
SKILLS
Amazon Web Services (AWS), Analysis Skills, Ansible, Apache JMeter, Application Hosting, Application Programming Interface (API), Automation, Bash Scripting, Budgeting, Business Analysis, Business Operations, Caching, Capacity Management, Change Management, Cloud Computing, Communication Skills, Computer Science, Computer Security, Configuration Management, Continuous Deployment/Delivery, Continuous Improvement, Continuous Integration, Cook Dishes, Database Middleware Software, Digital Certificates, Disaster Recovery, Distributed Computing, Documentation, Ecosystems, Enterprise Protection, Establish Priorities, Failover, Failure Analysis, Finance, Financial Planning, Forecasting, GCP (Good Clinical Practices), GitHub, Government Organizations, Health Maintenance, High Availability, Incident Management, Incident Response, Information Technology & Information Systems, Instrumentation, Internet Security, Leadership, Linux Operating System, Load Balancing, Machine Tool, Management of Information Systems/Technology (MIS), Messaging Middleware, Metrics, Microservices, Microsoft SQL Server, Microsoft Windows Azure, Microsoft Windows Server, Multiplatform/Cross-Platform, NMap, Nagios Monitoring Tool, Negotiation Skills, Network Attached Storage (NAS), Oracle Database, PostgreSQL, Problem Solving Skills, Process Improvement, Python Programming/Scripting Language, Reliability Engineering, Requirements Management, Resource Utilization, Risk Management, Root Cause Analysis, Sales Pipeline, Scripting (Scripting Languages), Secure Coding, Security Attacks, Security Compliance, Software Development, Software Engineering, Software Patches, Splunk, Storage Area Network (SAN), System Architecture, System Operations, Systems Administration/Management, Team Player, Telemetry, VMWare, Windows PowerShell, Wireshark (Ethereal), tcpdump
LOCATION
Southlake, TX
POSTED
6 days ago

Your Opportunity

At Schwab, you're empowered to make an impact on your career. Here, innovative thought meets creative problem solving, helping us "challenge the status quo" and transform the finance industry together.

Schwab Technology Services enables the future of how clients manage their money by providing innovative and reliable technology products and services as part of our ongoing commitment to democratize access to investing and financial planning.

A Manager for Advisor Services Technology (AST) Infrastructure Operations SRE will lead the strategy, execution, and operational excellence of the application infrastructure ecosystem supporting AST platforms. This role is accountable for ensuring high availability, scalability, reliability and performance through disciplined operational practices, life cycle management, and modern SRE principles. This requires an oversight of all routine and strategic infrastructure initiatives, including operating system upgrades, patching, EOL remediation, infrastructure changes, middleware and database activities, cloud technologies and readiness, tooling modernization, and automation at scale. You will drive holistic capacity management, ensuring that compute, storage, network and application-tier resources are designed and maintained to meet current and future business demand. You will partner closely with architecture and application engineering teams to ensure infrastructure and platform components align with solution designs and support the long-term technical roadmap. The role also governs the organization''s observability platforms - defining the telemetry strategy, metrics, SLOs, and alerting posture necessary to maintain operational health and reduce toil. You will lead ongoing improvements in automation, resilience engineering, disaster recovery readiness, and operational maturity, creating repeatable, well-engineered processes that support rapid change with minimal risk.

This role requires a deep understanding of enterprise infrastructure and security principles, excellent analytical skills, and the ability to communicate effectively with technical and non-technical stakeholders.

What you're good at • Strategic thinker who is passionate about application infrastructure reliability and efficiency. • Strong stakeholder engagement - able to work with application teams, I&O, and senior leadership. Drive consensus, negotiate priorities, and resolve conflicts. • Effective decision-maker driving solutions and leadership updates during high-pressure incidents. • Leads with integrity and sound judgment, showing the courage to uphold what''s right in all situations. • High standard of change management quality by enforcing rigor, reducing operational risks, and ensuring predictable, safe deployments. • Practice Site Reliability Engineering mindset and solve problems through automation and instrumentation. • Identify opportunities to build innovative tools and solve unique operations problems on large enterprise and mission critical applications. • Drive continuous improvement via automation across infrastructure provisioning, configuration management, compliance, system health, and operational activities. • Monitor the current state of infrastructure to identify deficiencies through aging of the technologies used by the application, or misalignment with business requirements. • Analyze the business-IT environment (run, grow and transform the business) to detect critical deficiencies, and recommend solutions for improvement. • Govern change management practice, ensuring minimal service impact of infrastructure changes and activities. • Lead capacity planning across compute, storage and application tiers to ensure scalability and optimization. • Implement proactive monitoring and forecasting to prevent performance degradation across all supported platforms (on-prem and cloud technologies). • Partner with architecture teams to improve system resiliency, failover design, and scalability patterns. • Establish standards for tooling around runbooks, incident response, and environment configuration. • Lead complex incident triage and root-cause analysis, drive action plans to eliminate recurrences. • Coordinate DR exercises, ensuring process and documentation accuracy, and cross-team alignment. • Oversee Cybersecurity risks, threat and vulnerability programs.

What you have

Required Qualifications

  • Master's degree in Computer Science, Master of Science, Information Technology Management, Management Information System or a related field. • 10+ years of experience in Site Reliability Engineering and Production Operations. • Deep knowledge of application hosting patterns: distributed systems, microservices, message queues, caching, API gateways. • Expertise in managing infrastructure (VMware, Linux, Windows Server, SAN/NAS, Load balancers, Containers- PCF), and configuration management. • Knowledge of cloud platforms (GCP, AWS, Azure) and cloud-native SRE practices. • Proven experience with automation and scripting - observability metrics, and productivity enhancements with scripting languages and tooling like Python, PowerShell, Bash, Ansible, SaltStack, Chef, Terraform. • Strong working experience with observability platforms (Splunk, Grafana, AppDynamics, ITRS, Dynatrace, etc) • Familiarity with secure coding practices and software development methodologies. • Excellent analytical and problem-solving skills to identify, assess, and prioritize production outage resolution effectively. • Strong understanding of service-level objectives (SLOs), error budgets, resilience patterns, and failure-mode analysis. • Solid working knowledge of Schwab resiliency policy - design high availability and disaster recovery architectures. • Experience in security compliance and threat remediation. • Hands-on capacity management experience, analyze and forecast resource utilization.

Preferred Qualifications

  • Google Cloud Certification - Associate Cloud Engineer.
  • Experience in software development, CICD pipeline is beneficial - Bitbucket, Github. • Familiarity with security standards and frameworks. Knowledge of Veracode and Qualys scans, Chef InSpec, Certificate management and vulnerability remediation. • Knowledge of database platforms - Oracle DB, MsSQL, Postgres, Mongo. • Understanding of networking tools like Wireshark, Nmap, tcpdump, Nagios, JMeter.

In addition to the salary range, this role is also eligible for bonus or incentive opportunities.

About the Company

T

The Charles Schwab Corp

The Charles Schwab Corporation is a leading provider of financial services, with more than 300 offices. Through its operating subsidiaries, the company provides a full range of securities brokerage, banking, money management and financial advisory services to individual investors and independent investment advisors. Named "Highest in Investor Satisfaction with Self-Directed Services" by J.D. Power and Associates in 2009, its broker-dealer subsidiary, Charles Schwab & Co., Inc. (member SIPC) affiliates offer a complete range of investment services and products including an extensive selection of mutual funds; financial planning and investment advice; retirement plan and equity compensation plan services; referrals to independent fee-based investment advisors; and custodial, operational and trading support for independent, fee-based investment advisors through Schwab Advisor Services.

The Charles Schwab Bank (member FDIC) provides banking and mortgage services and products. To meet the needs of our clients, we are actively recruiting people with the desire, drive and creativity to find solutions that help meet our clients' needs; who want the chance to learn, grow with the company and explore their career opportunities; who will strive for excellence in achieving our clients' and our company's goals; who have the highest ethical standards - individuals who take pride in making a difference in people's lives.
COMPANY SIZE
1,000 to 1,499 employees
INDUSTRY
Security and Surveillance
FOUNDED
1971
WEBSITE
http://www.aboutschwab.com/careers