Artificial Intelligence (AI), Benchmarking, CUDA (Compute Unified Device Architecture), Computer Science, Computer Storage Hardware, Computer Systems, Deep Learning, Electrical Engineering, GPU (Graphics Processing Unit), JAX (Java API for XML), Large-Scale Systems, Memory Hardware, Open Source, Performance Analysis, Performance Tuning/Optimization, Reinforcement Learning, Research Skills
LOCATION
San Jose, CA
POSTED
30+ days ago
About the Team
The Seed Infrastructures team oversees the distributed training, reinforcement learning framework, high-performance inference, and heterogeneous hardware compilation technologies for AI foundation models.
As a project intern, you will have the opportunity to engage in impactful short-term projects that provide you with a glimpse of professional real-world experience. You will gain practical skills through on-the-job learning in a fast-paced work environment and develop a deeper understanding of your career interests.
Applications will be reviewed on a rolling basis - we encourage you to apply early.
Responsibilities
Contribute to AI compiler optimizations for training and inference workloads
Develop and extend MLIR-based compiler passes for graph lowering, optimization, and code generation
Optimize model execution on GPU and NPU accelerators, focusing on performance, memory efficiency, and scalability
Support model deployment pipelines, including compilation, packaging, and runtime integration
Assist with distributed training and inference acceleration, such as parallel execution, communication optimization, and runtime scheduling
Benchmark, profile, and analyze performance of large-scale models across different hardware backends
Collaborate with researchers and engineers to translate model and system requirements into compiler and runtime improvementsMinimum Qualifications
Currently pursuing a Bachelor's or Master's degree in Computer Science, Electrical Engineering, or related technical fields
Experience using or developing open source frameworks for LLM inference such as vLLM or SGLang. Proficient in at least one deep learning framework (e.g., PyTorch, Megatron, DeepSpeed, JAX), with experience in model inference workflows
Understanding of modern computing systems, including hardware, storage, and networking, and how they impact ML workloads
Familiarity with compilers or model optimization pipelines (e.g., PyTorch Dynamo), or related model execution workflows
Able to commit to working for 12 weeks in 2026
Preferred Qualifications
Experience with distributed or large-scale ML systems, including training or inference pipelines and related optimizations (e.g., FSDP, DeepSpeed, Megatron, GSPMD)
Experience with GPU/TPU/NPU programming and performance optimization, or high-performance computing and communication (e.g., CUDA, Triton, NCCL, RDMA)
Understanding of AI compiler and model optimization stacks (e.g., torch.fx, PyTorch Dynamo, XLA, MLIR)