Santa Clara, CA30+ days ago
This expertise should be evidenced by significant hands-on experience in large-scale C++/HIP/CUDA projects, such as contributing to the ROCm ecosystem (e.g., rocBLAS, hipDNN, Composable Kernel, AITemplate), CUDA libraries (e.g., cuBLAS, cuDNN, CUTLASS, Thrust, CUB, NCCL), or the C++/HIP/CUDA core of ML frameworks like PyTorch, TensorFlow, or JAX. AI post-training is equally critical, and requires deep understanding of LLMs, including but not limited to transformer architectures, attention mechanisms, and the full model lifecycle, with hands-on experience in advanced model alignment and post-training techniques like Supervised Fine-Tuning (SFT) and Reinforcement Learning (e.g., RLHF, GRPO).