Lead Test Engineer, Server Compute Firmware - AI Data Center 1

Celestica Inc

Austin, TX

JOB DETAILS
SKILLS
ARM (Advanced RISC Machine), Aerospace and Defense, Analysis Skills, Artificial Intelligence (AI), Aviation Industry, Baseband, Broadband, CPU (Central Processing Unit), Capital Equipment, CentOS, Cloud Computing, Command Line, Communication Skills, Computer Firmware, Computer Science, Computer Servers, Computer Storage Hardware, Continuous Improvement, Continuous Integration, Debugging Skills, Develop Methodologies, Distributed Computing, Docker, Electrical Engineering, Ethernet, File Systems, GPU (Graphics Processing Unit), Go Programming Language (Golang), Hardware Design and Simulation Software, Hardware Quality Assurance, Injections, Integrated Circuits (ICs), Interpersonal Skills, Leadership, Linux Operating System, Manufacturing, Medical Equipment, Medical Products, Memory Hardware, Memory Subsystem, Mentoring, Network Attached Storage (NAS), Network Operations Center, Network Testing, Open Source, PCI Express (PCI-E), Performance Analysis, Performance Testing, Power Management, Problem Solving Skills, Product Development, Product Lifecycle, Product Testing, Python Programming/Scripting Language, Quality Assurance Methodology, RAID Storage, Red Hat Linux Operating System, Reliability Testing, SAP, Scripting (Scripting Languages), Serial ATA (SATA), Server Architecture, Server Hardware, Software Design, Software Development, Software Engineering, Software Testing, Solid State Drive (SSD), Storage Architecture, Storage Area Network (SAN), Strategic Planning, Stress Testing, Supply Chain, Systems Engineering, TCP/IP (Transmission Control Protocol/Internet Protocol), Team Lead/Manager, Team Player, Technical Leadership, Test Automation, Test Case, Test Harness, Test Plan/Schedule, Test Requirements, Test Scripts, Test Strategy, Test Tools, Testing, Ubuntu, x86 Processors
LOCATION
Austin, TX
POSTED
8 days ago

Req ID: 137753

Region: Americas

Country: USA

State/Province: Texas

City: Austin

General Overview

Functional Area: Engineering

Career Stream: Design - Software Engineering

SAP Short Name: LEN-ENG-DSE

Job Level: Level 08

IC/MGR: Individual Contributor

Direct/Indirect Indicator: Indirect

Summary

The Senior Lead Server Compute CPU & GPU Firmware Test Engineer will play a pivotal role in the design, development, and execution of comprehensive test strategies for our AI data centers server infrastructure. This leadership position requires deep expertise in server architectures, enterprise storage systems, networking, and a strong understanding of the unique performance and reliability demands of AI/ML workloads. The ideal candidate will be a hands-on technical leader, capable of mentoring junior engineers, driving test automation, and collaborating across engineering teams to deliver robust and high-performing solutions

Knowledge / Skills / Competencies

  • Define, develop, and implement comprehensive test plans and strategies for all storage and server hardware, firmware, and software components within the AI data center environment.

  • Lead the test team in designing, executing, and analyzing complex test cases, including functional, performance, reliability, stress, and endurance testing.

  • Mentor and provide technical guidance to junior test engineers, fostering a culture of technical excellence and continuous improvement.

  • Design and implement automated test frameworks and scripts using languages like Python, Go, or similar, to improve efficiency and coverage of testing.

  • Conduct in-depth performance analysis and bottleneck identification for server platforms (e.g., CPU, GPU, memory, PCIe, networking), OpenBMC interfaces/features and storage systems (e.g., NVMe, SSD, HDD arrays, distributed storage, SAN/NAS)

  • This includes debugging issues related to BMC functionality and its interaction with server hardware.

  • Develop and maintain robust testbeds and infrastructure for continuous integration and validation.

  • Utilize open-source and commercial test tools relevant to server, OpenBMC and storage validation.

  • Collaborate closely with hardware design, software development, infrastructure, and AI/ML engineering teams to understand requirements and integrate testing throughout the product lifecycle.

  • Communicate test progress, results, and critical issues effectively to stakeholders, including executive leadership.

  • Develop specialized test methodologies to validate performance and reliability under heavy AI/ML workloads (e.g., large model training, inference at scale, data ingestion).

  • Understand and test the interactions between GPU-accelerated computing, high-speed networking, and storage systems.

Qualifications

  • Bachelors or Masters degree in Computer Science, Electrical Engineering, or a related technical field.

  • 5+ years of experience in hardware and/or software testing, with at least 5 years focused on enterprise-level storage and server systems.

  • 1+ years of experience in a lead or senior technical role, mentoring junior engineers or leading test initiatives.

  • Proven experience in a lead or senior technical role, mentoring and guiding other engineers.

  • Deep expertise in server architectures (x86, ARM, GPU servers), CPU/memory subsystems, PCIe, and power management.

  • Strong understanding of server architectures (x86, ARM, GPU servers), CPU/memory subsystems, PCIe, power management, and Baseband Management Controllers (BMC) functionality.

  • Strong understanding of storage technologies such as NVMe, SAS/SATA SSDs/HDDs, RAID, distributed file systems (e.g., Ceph, Lustre, GPFS), SAN, and NAS.

  • Proficiency in scripting languages (e.g., Python, Bash) for test automation and data analysis.

  • Experience with Linux operating systems (e.g., Ubuntu, CentOS, RHEL) and command-line tools.

  • Familiarity with networking concepts (Ethernet, TCP/IP, InfiniBand) and network testing methodologies.

  • Experience with test methodologies such as performance testing, reliability testing, stress testing, and fault injection.

  • Excellent problem-solving, analytical, and debugging skills.

  • Strong communication and interpersonal skills, with the ability to collaborate effectively across diverse teams.

Preferred Qualifications:

  • Familiarity with OCP (Open Compute Project)

  • Experience with cloud environments (AWS, Azure, GCP) and virtualization technologies.

  • Knowledge of containerization technologies (Docker, Kubernetes).

  • Familiarity with AI/ML frameworks (e.g., TensorFlow, PyTorch) and their infrastructure requirements.

  • Experience with performance profiling tools (e.g., fio, Iometer, Perf, VTune).

  • Contributions to open-source projects related to storage, servers, or testing.

  • Certifications in relevant technologies (e.g., NetApp, Dell EMC, HPE, NVIDIA).

Notes

This job description is not intended to be an exhaustive list of all duties and responsibilities of the position. Employees are held accountable for all duties of the job. Job duties and the % of time identified for any function are subject to change at any time.

All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, or status as a protected veteran.

Celesticas policy on equal employment opportunity prohibits discrimination based on race, color, creed, religion, national origin, gender, sexual orientation, gender identity, age, marital status, veteran or disability status, or other characteristics protected by law.

This policy applies to hiring, promotion, discharge, pay, fringe benefits, job training, classification, referral and other aspects of employment and also states that retaliation against a person who files a charge of discrimination, participates in a discrimination proceeding, or otherwise opposes an unlawful employment practice will not be tolerated. All information will be kept confidential according to EEO guidelines.

COMPANY OVERVIEW:

Celestica (NYSE, TSX: CLS) enables the worlds best brands. Through our recognized customer-centric approach, we partner with leading companies in Aerospace and Defense, Communications, Enterprise, HealthTech, Industrial, Capital Equipment and Energy to deliver solutions for their most complex challenges. As a leader in design, manufacturing, hardware platform and supply chain solutions, Celestica brings global expertise and insight at every stage of product development - from drawing board to full-scale production and after-market services for products from advanced medical devices, to highly engineered aviation systems, to next-generation hardware platform solutions for the Cloud. Headquartered in Toronto, with talented teams spanning 40+ locations in 13 countries across the Americas, Europe and Asia, we imagine, develop and deliver a better future with our customers.

Celestica would like to thank all applicants, however, only qualified applicants will be contacted.

Celestica does not accept unsolicited resumes from recruitment agencies or fee based recruitment services.

This location is a US ITAR facility and these positions will involve the release of export controlled goods either directly to employees or through the employees movement within the facility. As such, Celestica will require necessary information from all applicants upon an applicants acceptance of employment to determine if any export control exemptions or licenses must be filed.

About the Company

C

Celestica Inc