AI agents can diagnose infrastructure problems in seconds - but they still stop at "here"s what you should do" because nobody trusts them to actually press the button. Your teams will build the execution layer that changes that. This is a Sr. SDM role leading three engineering teams in AWS Systems Manager"s Automation suite, making it safe for AI agents and human operators to take real actions on cloud infrastructure at scale.
Systems Manager Automation already runs 175M+ steps per week across 668K active accounts. The next chapter is turning it into the standard execution interface for autonomous operations - where AI agents (AWS Frontier Agents, third-party tools, customer-built agents) can safely execute runbooks with pre-execution impact analysis, blast radius scoping, and automatic rollback. Your teams will ship the capabilities that make customers say "yes, I trust this to run without me watching."
What your teams will build:
Why this is a rare opportunity:
What you"ll do:
You"re a great fit if you"ve led multiple engineering teams, operated services at scale, and are energized by the problem of building trust in autonomous systems. Experience with AI/ML integration, workflow engines, or safety-critical systems is a plus - but strong engineering leadership fundamentals matter more than domain expertise.
Key job responsibilities
You"ll own three engineering teams end-to-end - their roadmaps, their operational health, and their people. Specifically:
A day in the life
Mornings usually start with operational signals - checking deployment health, scanning overnight tickets, reviewing what your oncall teams handled. Mid-morning might be a roadmap review with your PM counterpart, followed by a 1:1 with one of your SDMs where you"re coaching them through a hard prioritization call. Afternoons shift between design reviews (your teams are building safety-critical systems - the details matter), cross-team syncs with AI agent partner teams, and unblocking work. You"ll context-switch between people problems and systems problems daily - and enjoy both.
About the team
AWS Systems Manager helps customers operate their infrastructure safely at scale - from a handful of servers to millions of managed nodes across AWS, on-premises, and multi-cloud environments. Our team builds the execution engine: the runbooks, the orchestration, and the safety mechanisms that let customers (and increasingly, AI agents) take action on their infrastructure with confidence.
We"re also pushing the boundary on how engineering teams themselves work - using AI agents in our own development workflows for operational reviews, code quality, incident investigation, and decision support. We build AI-powered products and we"re practitioners of AI-assisted engineering.