GPU Software Engineer

Triune Infomatics

San Jose, CA

JOB DETAILS
SKILLS
Artificial Intelligence (AI), Atlassian JIRA, Benchmarking, C Programming Language, C++ Programming Language, CUDA (Compute Unified Device Architecture), Code Reviews, Communication Skills, Computer Firmware, Consulting, Documentation, Ecosystems, Ethernet, GPU (Graphics Processing Unit), GitHub, IBM Cognos, Kernel Programming, Memory Hardware, Onboarding, Open Source, Project/Program Management, Python Programming/Scripting Language, Semiconductors, Software Design, Software Development, Software Engineering, System Integration (SI), Team Player, Technical Publications, Technical Writing, Testing, Writing Skills
LOCATION
San Jose, CA
POSTED
30+ days ago
Role: GPU Software Engineer
Location: San Jose, CA – Onsite
Duration: 12+ Months/Contract-to-Hire
 
 
Overview: The Client is seeking an experienced GPU Software Engineer for a 12-month milestone-based engagement supporting a cutting-edge GPU software integration project. The consultant will work on AMD GPU platforms, drive AI stack development, contribute to open-source projects, and deliver performance benchmarking and integration reports across a structured set of monthly deliverables. This is a highly technical, hands-on role requiring deep expertise in GPU software stacks, ROCm, AI frameworks, and systems-level integration.
 
Manager's Note: Client confirmed flexibility around AMD-specific experience and is open to strong GPU software engineers from the NVIDIA/CUDA ecosystem, provided they possess solid GPU architecture fundamentals and can ramp up on ROCm. ROCm ecosystem exposure is considered a key factor for success, while MI210 deliverables will have onboarding support. Open-source contributions to SGLang will be coordinated through Samsung channels, with additional clarification pending on ramp-up timing before milestone tracking begins.
 
Position Details:
  • Project Title: GPU SW Integration for Samsung Cognos
  • Engagement Type: Contract / Milestone-Based (12 Months)
  • Client Environment: AMD MI210 GPU, CXL Memory, NVMe Gen6, ROCm Stack
  • Delivery Tools: Confluence, Jira, GitHub/GitLab (client-provided)
 
Key Responsibilities:
  • Design and develop GPU software modules aligned with project milestones.
  • Perform systems integration and end-to-end testing of AI stack SW modules.
  • Validate AMD Infinity Bridge and AIS on MI210 GPU hardware.
  • Conduct functional and performance benchmarking (pSLC Firmware, CXL, ROCm).
  • Implement and validate SGLang changes for L3 to L1 memory transfer optimization.
  • Develop and contribute CaMa module changes to the ROCm software stack.
  • Collaborate with the SGLang open-source community and contribute code to their public GitHub repo.
  • Develop CaMa module for ROCm over Infinity Fabric/Ethernet.
  • Perform E2E performance benchmarking and publish formal benchmarking reports.
  • Integrate CaMa changes into the Cognos AI stack and publish integration documentation.
  • Scope UALink support for CaMa and publish an investigation/feasibility document.
  • Maintain all documentation, code, and status updates in Confluence, Jira, and GitHub/GitLab.
 
Required Skills and Qualifications:
GPU Software and Hardware
  • Hands-on experience with AMD GPU platforms, specifically MI210.
  • Proficiency with AMD ROCm software stack including kernel libraries and drivers.
  • Experience with AMD Infinity Bridge / Infinity Fabric architecture.
  • Familiarity with CXL (Compute Express Link) memory integration.
  • Experience with NVMe storage and GPU Direct Storage (GDS).
AI Frameworks and Software Stack
  • Experience with SGLang or similar LLM inference frameworks.
  • Familiarity with AI stack installation and end-to-end workload benchmarking.
  • Knowledge of GPU memory hierarchy (HBM, L1/L3 cache) and data transfer optimization.
  • Proficiency in GPU kernel programming and library management (e.g., GDS, CaMa).
Programming and Tools
  • Strong proficiency in C/C++ and Python for GPU/systems-level development.
  • Experience with open-source contribution workflows (GitHub, pull requests, code reviews).
  • Familiarity with Jira and Confluence for project management and documentation.
  • Experience with pSLC firmware validation and performance benchmarking methodologies.
Soft Skills
  • Ability to work independently and deliver against defined monthly milestones.
  • Strong written communication skills for publishing technical reports and documentation.
  • Collaborative mindset; ability to work with third-party teams (AMD, SGLang community).
 
Preferred Qualifications:
  • Prior experience with Samsung Cognos AI stack or similar enterprise AI platforms.
  • Familiarity with UALink protocol and its GPU interconnect applications.
  • Prior open-source contributions to ROCm, SGLang, or similar GPU frameworks.
  • Experience presenting benchmarking results to semiconductor partners (AMD, NVIDIA, etc.).

About the Company

T

Triune Infomatics