Cloud Operations Engineer - Infrastructure

TP-Link Corporation Ltd

Irvine, CA

JOB DETAILS
SKILLS
Amazon Web Services (AWS), Application Programming Interface (API), Automation, Autoscaling, Budget Management, Capacity Management, Cloud Computing, Configuration Management, Continuous Deployment/Delivery, Continuous Integration, Customer Support/Service, Ecosystems, Go Programming Language (Golang), Identify Issues, Incident Response, Linux Operating System, On Call, Operational Strategy, Production Management, Production Support, Python Programming/Scripting Language, Reliability Engineering, Security Architecture, Security Infrastructure, Smart Homes, Software Engineering, Wi-Fi
LOCATION
Irvine, CA
POSTED
4 days ago

ABOUT US:

Headquartered in the United States, TP-Link Systems Inc. is a global provider of reliable networking devices and smart home products, consistently ranked as the world's top provider of Wi-Fi devices. The company is committed to delivering innovative products that enhance people's lives through faster, more reliable connectivity. With a commitment to excellence, TP-Link serves customers in over 170 countries and continues to grow its global footprint.

We believe technology changes the world for the better! At TP-Link Systems Inc, we are committed to crafting dependable, high-performance products to connect users worldwide with the wonders of technology.

Embracing professionalism, innovation, excellence, and simplicity, we aim to assist our clients in achieving remarkable global performance and enable consumers to enjoy a seamless, effortless lifestyle.

KEY RESPONSIBILITIES

  • Design, build, and maintain reliable, scalable, and secure cloud-native infrastructure platforms supporting large-scale production workloads.
  • Operate and optimize multi-account AWS environments, ensuring infrastructure is secure, repeatable, and auditable through Infrastructure as Code tools such as Terraform.
  • Manage production Kubernetes clusters, including provisioning, upgrades, autoscaling, networking, observability, capacity planning, and day-to-day operations.
  • Build and operate Kubernetes ecosystem components such as CRDs, Helm, HPA, Cluster Autoscaler, CoreDNS, and Cluster API.
  • Operate and improve GitOps-based deployment workflows using tools such as FluxCD or ArgoCD.
  • Manage and enhance Istio service mesh capabilities, including traffic routing, service discovery, resilience, security, and service-to-service communication.
  • Define and improve reliability practices, including SLOs, Error Budgets, monitoring, alerting, incident response, and post-mortems.
  • Participate in a scheduled on-call rotation to support production cloud infrastructure and Kubernetes platforms.
  • Troubleshoot complex production issues across cloud infrastructure, Kubernetes, Linux systems, networking, and distributed services.
  • Drive automation for infrastructure provisioning, configuration management, CI/CD pipelines, observability, and operational workflows using Terraform, Go, Python, or similar technologies.
  • Collaborate with application engineering, architecture, security, and platform teams to improve infrastructure reliability, scalability, and operational efficiency.

About the Company

T

TP-Link Corporation Ltd