Lead Engineer for Manufacturing and Datacenter Lab, Trainium Manufacturing, Quality and Reliability

Amazon.com Inc

Austin, TX

JOB DETAILS
SKILLS
Amazon Web Services (AWS), Assembly Line, Change Management, Computer Firmware, Computer Servers, Continuous Improvement, Cross-Functional, DFT (Design for Test), Data Analysis, Design Flows, Failure Analysis, Hardware Design, Hardware Development, Laboratory Analysis, Machine Learning, Manufacturing, Manufacturing Design, Manufacturing Engineering, Manufacturing Operations, Manufacturing Systems, Manufacturing/Industrial Processes, Manufacturing/Production Testing, Mentoring, Network Operations Center, Organizational Development/Management, Original Design Manufacturer (ODM), Performance Analysis, Power Management, Process Improvement, Product Lifecycle, Quality Assurance, Quality Assurance Methodology, Quality Engineering, Reliability Engineering, Requirements Management, Root Cause Analysis, Safety Standards, Signal Integrity, Software Development, System Test, Test Data, Test Requirements, Test Strategy, Testing, Vehicle Fleets
LOCATION
Austin, TX
POSTED
30+ days ago

Within the Trainium Manufacturing Quality & Reliability (TRN MQR) organization, we are establishing a critical new function that bridges manufacturing outcomes with datacenter operational performance. We are seeking a talented and motivated Manufacturing & Datacenter Preparedness Lab Leader to build and lead this strategic capability in Austin, Texas.

This role will report to the leader of Trainium Manufacturing Quality & Reliability and serve as the essential feedback loop between our ODM/JDM/CM manufacturing operations and AWS datacenter fleet performance. You will establish and operate a specialized preparedness lab focused on analyzing datacenter performance of manufactured Trainium systems to identify root causes of field rework and repairs, feeding critical insights back into manufacturing processes, test strategies, and design improvements.

You will participate in the early phase of manufacturing line development for our next generation servers and racks to improve our manufacturing flows informing system design, manufacturing, and fleet operations. You will manage early lifecycle changes, identify initial product quality improvements, and drive to technical root cause in supplier quality activities. The candidate will have experience in design or manufacturing and is capable of making wide-ranging business decisions on behalf of the organization.

You"ll join a diverse team working across Manufacturing Engineering, Manufacturing Test Engineering, and Quality & Reliability Engineering. You"ll collaborate with people across AWS Data Center Engineering, Hardware Design, ODM/JDM/CM partners, and datacenter operations teams to help us deliver the highest standards for safety and reliability while providing seemingly infinite capacity at the lowest possible cost for our customers. And you"ll experience an inclusive culture that welcomes bold ideas and empowers you to own them to completion.

Key job responsibilities

  • Own operational production performance of Trainium systems across entire product lifecycle from manufacturing through datacenter deployment and fleet operations
  • Design and build preparedness lab replicating datacenter conditions for assembly, repair and system testing
  • Define and drive assembly and repair recipes in the manufacturing lab as the baseline prior to high volume manufacturing and datacenter deployment.
  • Ensure all manufacturing and datacenter test flows are regressed in the manufacturing lab prior to deployment.
  • Influence hardware design strategy for Design for Manufacturing (DFM), Design for Reliability (DFR), and Design for Test (DFT) based on field failure analysis
  • Establish data-driven analytics frameworks connecting manufacturing test data to datacenter performance, leveraging ML techniques to predict field failures
  • Build and mentor cross-functional team spanning manufacturing, test, quality, and reliability engineering; perform technical promotion assessments as force multiplier
  • Collaborate with AWS datacenter operations teams to understand failure modes, repair patterns, and operational challenges firsthand; translate operator insights and field learnings into actionable manufacturing process improvements and design changes
  • Drive continuous improvement reducing failure rates and lifecycle degradation through rapid feedback loops
  • Develop or adapt manufacturing process at the ODM and CM, including defining fixture requirements, critical assembly requirements, test methodology, signal integrity, power and heat management requirement

About the team

Annapurna Labs is a wholly owned subsidiary of AWS, focused on developing custom silicon and servers including the Nitro(K2), Graviton, Inferentia, and Trainium families of processors.

Machine Learning Annapurna functions as a vertically integrated team including software, firmware, hardware, and silicon design in a single organization.

We are the Trainium Servers and Systems organization under MLA focused on Hardware Development, Software Development, Fleet Ops Systems, and Manufacturing, Quality, and Reliability.

This position is in the Manufacturing, Quality and Reliability team.

About the Company

A

Amazon.com Inc

At Amazon, we don’t wait for the next big idea to present itself. We envision the shape of impossible things and then we boldly make them reality. So far, this mindset has helped us achieve some incredible things. Let’s build new systems, challenge the status quo, and design the world we want to live in. We believe the work you do here will be the best work of your life.

Wherever you are in your career exploration, Amazon likely has an opportunity for you. Our research scientists and engineers shape the future of natural language understanding with Alexa. Fulfillment center associates around the globe send customer orders from our warehouses to doorsteps. Product managers set feature requirements, strategy, and marketing messages for brand new customer experiences. And as we grow, we’ll add jobs that haven’t been invented yet.

It’s Always Day 1
At Amazon, it’s always “Day 1.” Now, what does this mean and why does it matter? It means that our approach remains the same as it was on Amazon’s very first day – to make smart, fast decisions, stay nimble, invent, and stay focused on delighting our customers. In our 2016 shareholder letter, Amazon CEO Jeff Bezos shared his thoughts on how to keep up a Day 1 company mindset. “Staying in Day 1 requires you to experiment patiently, accept failures, plant seeds, protect saplings, and double down when you see customer delight,” he wrote. “A customer-obsessed culture best creates the conditions where all of that can happen.” You can read the full letter here

Our Leadership Principles
Our Leadership Principles help us keep a Day 1 mentality. They aren’t just a pretty inspirational wall hanging. Amazonians use them, every day, whether they’re discussing ideas for new projects, deciding on the best solution for a customer’s problem, or interviewing candidates. To read through our Leadership Principles from Customer Obsession to Bias for Action, visit https://www.amazon.jobs/principles
COMPANY SIZE
10,000 employees or more
INDUSTRY
Retail
FOUNDED
1994
WEBSITE
http://Amazon.com/militaryroles