About the Institute of Foundation Models
The Institute of Foundation Models is dedicated to advancing the science and engineering of large-scale AI systems. Our researchers and engineers develop cutting-edge foundation models while pushing the limits of high-performance computing and efficient AI inference. By combining deep expertise in machine learning, systems engineering, and hardware optimization, we build scalable AI solutions that drive scientific discovery and real-world impact.
As part of the team, interns work alongside world-class researchers and performance engineers to optimize the execution of large-scale foundation models on next-generation NVIDIA GPU architectures. This internship provides hands-on experience in low-level GPU performance analysis, kernel optimization, and hardware-aware inference acceleration.
Key Responsibilities
This intensive internship offers a unique opportunity to contribute to the development of a simulator and profiling framework for foundation model inference on NVidia GPUs.
Responsibilities include:
- Develop analytical performance models for GPU kernels and inference workloads.
- Build and validate a simulator to estimate theoretical hardware performance limits.
- Compare measured kernel performance against architectural peak throughput.
- Identify performance bottlenecks in compute, memory, communication, and scheduling.
- Analyze GPU execution using NVIDIA Nsight Systems and Nsight Compute.
- Investigate PTX and SASS code generation to understand low-level execution behavior.
- Collaborate with researchers and engineers to optimize inference kernels for transformer-based models.
- Evaluate utilization of Tensor Cores, memory bandwidth, caches, and instruction pipelines.
- Design profiling methodologies for Hopper and Blackwell architectures.
- Document findings and provide actionable recommendations for performance improvements.
Academic Qualifications
Currently pursuing a degree in Computer Science, Computer Engineering, Electrical Engineering, Artificial Intelligence, High-Performance Computing, or a related quantitative discipline.
Preferred Qualifications
- Experience with CUDA programming and GPU kernel development.
- Understanding of NVIDIA GPU architecture and memory hierarchy.
- Familiarity with performance profiling tools such as Nsight Systems and Nsight Compute.
- Knowledge of PTX, SASS, and low-level GPU execution.
- Experience optimizing CUDA kernels for throughput and latency.
- Understanding of roofline analysis, performance modeling, and hardware utilization metrics.
- Experience with deep learning frameworks such as PyTorch or TensorFlow.
- Strong programming skills in C++, CUDA, and Python.
Desired Skills
- Performance engineering mindset.
- Strong analytical and debugging abilities.
- Interest in AI systems, inference optimization, and hardware-software co-design.
- Ability to work independently on research and engineering challenges.
- Excellent written and verbal communication skills.