Principal DevOps, SRE & Application Infrastructure Architect

Tata Consultancy Services Ltd

Sunnyvale, CA

JOB DETAILS
SALARY
$110,000–$140,000 Per Year
SKILLS
Apache Cassandra, Automation, Budgeting, Caching, Cloud Computing, Continuous Deployment/Delivery, Continuous Integration, Cross-Functional, Database Administration, Database Design, Database Replication, Debugging Skills, DevOps, Digital Certificates, Distributed Computing, Docker, EDGE (Enhanced Data for GSM Evolution), Elasticsearch, Failover, HTTP (HyperText Transport Protocol), High Availability, Identify Issues, Incident Management, Incident Response, JSON, Java, Linux Operating System, Load Balancing, Messaging Technology, NoSQL, On Call, Operating Systems, Oracle, Oracle PL-SQL, PostgreSQL, Python Programming/Scripting Language, REST (Representational State Transfer), Redis, Replication and Remote Mirroring, SOLR, SQL (Structured Query Language), SSL-TLS (Secure Socket Layer - Transport Layer Security), Sales Pipeline, Scripting (Scripting Languages), Service Level Agreement (SLA), Simple Queue Service (SQS), Software Administration, Software Patches, Splunk, Telemetry, Unix Shell Programming, Unix System Internals/Programming
LOCATION
Sunnyvale, CA
POSTED
2 days ago

Key Responsibilities

  • Infrastructure & GitOps
  • K8s & Containerization: Design, deploy, and optimize secure Docker/Kubernetes (AKS) environments using Helm and ArgoCD.
  • Networking & Edge: Manage cloud Ingress, Load Balancers, and end-to-end certificate management (SSL/mTLS).
  • CI/CD & Automation: Automate tasks with Shell/Python; build GitOps pipelines and manage schema migrations via Flyway.
  • SRE & Observability
  • Reliability: Own end-to-end production availability and performance; define and track SLAs/SLOs/SLIs and error budgets.
  • Telemetry: Build observability stacks using OpenTelemetry, Prometheus, Grafana, and Splunk.
  • Incident Management: Lead P0/P1 incident response, deep-dive distributed system debugging, RCAs, and on-call rotations.
  • Application & Database Operations
  • Polyglot DB Management: Design and operate high-availability cloud database infrastructure (Oracle, Postgres, Cassandra, Couchbase, Redis, CockroachDB).
  • Data Replication & DR: Manage Oracle GoldenGate replication, patching, purging, and execute robust P0 Disaster Recovery/failover strategies.
  • App Support: Perform deep-dive troubleshooting within Java application layers, gRPC, REST/HTTP/JSON, and caching/messaging systems (SNS/SQS, Elasticsearch, Solr).

Required Technical Skills

  • Orchestration & DevOps: Kubernetes (AKS), Docker, Helm, ArgoCD, Flyway.
  • Scripting & OS: Strong Linux/Unix internals, Shell scripting, and Python.
  • Observability: OpenTelemetry, Prometheus, Grafana, Splunk.
  • Database & Replication: SQL/PL-SQL (procedures, triggers, tuning), GoldenGate, NoSQL (Cassandra, Couchbase, Redis), and Transactional DBs (Oracle, Postgres).
  • App Troubleshooting: Java application debugging, gRPC, REST, and cloud-native caching/queues.

Preferred Skills

  • Exposure to AliCloud or multi-cloud environments.
  • Security operations (automated password rotations, IAM privilege management).
  • Strong cross-functional collaboration with security, network, and application teams.

Location: Sunnyvale, CA or Austin, TX (3days in office)

Salary Range:$110,000-$140,000 Per a Year

#LI-AS3

About the Company

T

Tata Consultancy Services Ltd