Software Engineer, ML Fleet Intelligence

Google

Sunnyvale, CA

JOB DETAILS
JOB TYPE
Full-time, Employee
SKILLS
Algorithms, Analysis Skills, Artificial Intelligence (AI), Artificial Intelligence (AI) Natural Language, Business Growth, Business Strategy, Cloud Computing, Computer Science, Cross-Functional, Data Modeling, Data Processing, Data Storage, Data Structures, Debugging Skills, Distributed Computing, Equal Employment Opportunity (EEO), Hardware Design, Information Retrieval, Information/Data Security (InfoSec), Internet Search, Large-Scale Systems, Leadership, Machine Learning, Matrix Management, Natural Language Processing (NLP), Network Design, Network Operations Center, Operations Research, Predictive Modeling, Product/Service Launch, Reinforcement Learning, Reliability Engineering, Scalable System Development, Software Architecture, Software Architecture Design, Software Development, Software Engineering, Systems Engineering, Systems Reliability, Team Lead/Manager, Technical Leadership, Telemetry, Testing, User Documentation, User Interface Design, Vehicle Fleets
LOCATION
Sunnyvale, CA
POSTED
30+ days ago

Minimum qualifications:

  • Bachelor’s degree or equivalent practical experience.
  • 8 years of experience in software development.
  • 5 years of experience testing, and launching software products, and 3 years of experience with software design and architecture.
  • 5 years of experience with one or more of the following: Speech/audio (e.g., technology duplicating and responding to the human voice), reinforcement learning (e.g., sequential decision making), Machine learning (ML) infrastructure, or specialization in another ML field.
  • 5 years of experience with ML design and ML infrastructure (e.g., model deployment, model evaluation, data processing, debugging, fine tuning).

Preferred qualifications:

  • Master’s degree or PhD in Engineering, Computer Science, or a related technical field.
  • 8 years of experience with data structures and algorithms.
  • 3 years of experience in a technical leadership role leading project teams and setting technical direction.
  • 3 years of experience working in a complex, matrixed organization involving cross-functional, or cross-business projects.
  • Experience in predictive maintenance, anomaly detection, or systems reliability engineering.
  • Ability to translate complex technical findings into actionable business strategies for executive stakeholders.

About the job

Google's software engineers develop the next-generation technologies that change how billions of users connect, explore, and interact with information and one another. Our products need to handle information at massive scale, and extend well beyond web search. We're looking for engineers who bring fresh ideas from all areas, including information retrieval, distributed computing, large-scale system design, networking and data storage, security, artificial intelligence, natural language processing, UI design and mobile; the list goes on and is growing every day. As a software engineer, you will work on a specific project critical to Google’s needs with opportunities to switch teams and projects as you and our fast-paced business grow and evolve. We need our engineers to be versatile, display leadership qualities and be enthusiastic to take on new problems across the full-stack as we continue to push technology forward.

In this role, you will take control of the world’s largest data center footprint as an Applied AI/ML Specialist on a team responsible for the fault tolerance of Google’s entire fleet, including the ML TPUs. You will pioneer the use of AI/ML to solve complex infrastructure challenges by leveraging petabytes of operational and telemetry data, directly empowering the very AI/ML systems that drive the future of Google.

The AI and Infrastructure team is redefining what’s possible. We empower Google customers with breakthrough capabilities and insights by delivering AI and Infrastructure at unparalleled scale, efficiency, reliability and velocity. Our customers include Googlers, Google Cloud customers, and billions of Google users worldwide.

We're the driving team behind Google's groundbreaking innovations, empowering the development of our cutting-edge AI models, delivering unparalleled computing power to global services, and providing the essential platforms that enable developers to build the future. From software to hardware our teams are shaping the future of world-leading hyperscale computing, with key teams working on the development of our TPUs, Vertex AI for Google Cloud, Google Global Networking, Data Center operations, systems research, and much more.

Individual pay is determined by factors including job-related skills, experience, and relevant education or training. US: $207000 - $301000 (USD) + 20% bonus target + bonus + equity + benefitsLearn more about benefits at Google.

Responsibilities

  • Lead the design and implementation of solutions in specialized ML areas, optimize ML infrastructure, and guide the development of model optimization and data processing strategies.
  • Design and implement AI/ML models to predict, detect, and mitigate hardware and software faults across a global fleet.
  • Analyze petabytes of telemetry and performance data to uncover insights that improve the reliability of ML TPUs and traditional compute infrastructure.
  • Build scalable automated systems that allow Google’s data center footprint to grow while maintaining industry-leading uptime.
  • Partner with hardware designers and site reliability engineers (SREs) to integrate intelligent diagnostics into the core data center lifecycle.
Google is proud to be an equal opportunity workplace and is an affirmative action employer. We are committed to equal employment opportunity regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, gender identity or Veteran status. We also consider qualified applicants regardless of criminal histories, consistent with legal requirements. See also Google's EEO Policy and EEO is the Law. If you have a disability or special need that requires accommodation, please let us know by completing our Accommodations for Applicants form.

About the Company

G

Google

Build for everyone

Since our founding in 1998, Google has grown by leaps and bounds. Starting from two computer science students in a university dorm room, we now have thousands of employees and offices around the world. These Googlers build products that help create opportunities for everyone, whether down the street or across the globe.

It starts with how we work together. We’re building a company where people of different views, backgrounds and experiences can do their best work and show up for one another. A place where every Googler feels like they belong.

So whether you develop new technology or creative campaigns, craft beautiful products or breakthrough partnerships, your work here is a chance to accomplish things that matter. Bring your insight, imagination, and healthy disregard for the impossible. Bring everything that makes you unique. Together, we can build for everyone.

Benefits

We strive to provide Googlers and their loved ones with a world-class benefits experience, focused on supporting their physical, financial, and emotional wellbeing. Our benefits are based on data, and centered around our users: Googlers and their families. They’re thoughtfully designed to enhance your health and wellbeing, and generous enough to make it easy for you to take good care of yourself (now, and in the future). So we can build for everyone, together.

Learn more about Google’s benefits on this site featuring Googlers’ experience.

How we Hire

Google’s hiring process is an important part of our culture. Googlers care deeply about their teams and the people who make them up. In order to  build for everyone, we know that we need a wide range of perspectives and experiences, and a fair hiring process is the first step in getting there.

Learn more about our hiring process.

COMPANY SIZE
10,000 employees or more
INDUSTRY
Computer Software
EMPLOYEE BENEFITS
Paid Sick Days, Performance Bonus, Professional Development, 401K, Stock Options, Employee Events, Retirement / Pension Plans, Tuition Reimbursement, Work From Home, Life Insurance, On Site Cafeteria
FOUNDED
1998
WEBSITE
https://goo.gle/4dbno6V