Maxana is seeking an experienced Infrastructure Engineer for a confidential client — a fast-growing AI company. In this role you will build and maintain the platform layer supporting large-scale ML training, inference, and deployment. This is a high-impact role at the intersection of cloud infrastructure and ML systems.
Key Responsibilities
Build and maintain infrastructure supporting large-scale ML training and inference workloads
Work with GPU and compute infrastructure, distributed systems, and cloud-native platforms
Improve reliability, observability, and performance across the platform layer
Collaborate directly with senior engineers and product teams on architecture decisions
Own production reliability — monitoring, incident response, and proactive risk reduction
Develop and maintain internal tooling and automation to support engineering operations
Requirements
5+ years of infrastructure or platform engineering experience in a production environment
Strong distributed systems background — experience with large-scale compute workloads preferred
Cloud-native infrastructure experience — AWS, GCP, or Azure; Docker and Kubernetes required
Familiarity with ML infrastructure a strong plus — training pipelines, inference serving, GPU workloads
Experience owning production reliability end to end
Benefits
Competitive base salary ($130,000-$240,000) + equity
Medical, dental, and vision
Flexible paid time off
Learning and development stipend
Working at the forefront of AI infrastructure at scale