HPC Network Solutions Architect
GTN Technical Staffing
Dallas, TX
Apply
JOB DETAILS
LOCATION
Dallas, TX
POSTED
30+ days ago
HPC Network Solutions Architect
Location: Dallas, TX (Hybrid)
Type: Direct Hire
•Competitive base salary + performance bonus
•100% company-paid benefits
Overview
We are seeking an HPC Network Solutions Architect to design, integrate, and optimize high-performance networking architectures supporting HPC, AI/ML, and data-intensive workloads.
This is a customer-facing, technically focused role responsible for guiding clients across the full solution lifecycle—from requirements gathering and architecture design through proof-of-concept, deployment, and long-term optimization. The role bridges advanced networking technologies with real-world HPC adoption, ensuring low-latency, high-bandwidth infrastructure aligns with workload demands.
The ideal candidate brings deep expertise in HPC networking, strong experience across InfiniBand and Ethernet-based architectures, and the ability to translate complex requirements into scalable, production-ready solutions.
Key ResponsibilitiesCustomer Engagement & Architecture Leadership
•Serve as the primary networking subject matter expert for customers adopting or scaling HPC environments
•Capture performance goals, scalability requirements, and integration constraints to inform solution design
•Lead customer workshops, architecture reviews, and technical design sessions
HPC Network Architecture & Design
•Design and document end-to-end HPC network architectures including Ethernet, InfiniBand, RoCE, EVPN, and VXLAN fabrics
•Define scalable, low-latency network designs aligned with HPC and AI/ML workload requirements
•Develop architecture blueprints and integration strategies across compute, storage, orchestration, and security layers
Performance Optimization & Benchmarking
•Lead proof-of-concept and benchmarking initiatives to validate network performance and throughput
•Conduct network performance assessments, tuning, and optimization to eliminate bottlenecks
•Address scaling challenges such as data gravity, east-west traffic, and high-throughput demands
Observability & Monitoring
•Design and implement observability frameworks using Prometheus, Grafana, and vendor telemetry tools
•Provide visibility into network health, utilization, and performance across large-scale environments
Cross-Functional Collaboration
•Partner with engineering, product, and operations teams to refine architecture standards and delivery practices
•Collaborate with compute, storage, and platform teams to ensure integrated, workload-aware solutions
•Support multi-vendor environments and evaluate new networking technologies
Vendor & Ecosystem Engagement
•Work closely with vendors such as NVIDIA, Mellanox, Cisco, and Arista to integrate advanced capabilities
•Influence vendor roadmaps through feedback and joint evaluations
•Stay current on emerging HPC networking technologies and provide forward-looking guidance to customers
Thought Leadership & Innovation
•Represent the organization in customer engagements, workshops, and industry events
•Provide strategic insight into future networking trends including interconnect advancements and scalable architectures
•Contribute to best practices and reusable architectural patterns
Required Experience
•Proven experience in HPC networking architecture, data center network engineering, or large-scale distributed systems design
•Deep expertise with InfiniBand and RoCE, including deployment and tuning in production environments
•Strong experience designing large-scale Ethernet networks using BGP, OSPF, EVPN, and VXLAN
•Understanding of GPU communication frameworks such as MPI and NCCL and their interaction with HPC interconnects
•Experience working in Linux environments with scripting skills (Python, Bash, or PowerShell) for automation
•Experience supporting multi-vendor networking environments
•Ability to translate complex technical requirements into clear, scalable architectures
•Strong customer-facing communication skills with experience engaging both technical and executive stakeholders
Technical Skills
•Experience with network observability and telemetry platforms
•Familiarity with automation and Infrastructure-as-Code tools such as Terraform and Ansible
•Exposure to CNI plugins such as Multus, Cilium, and NVIDIA CNI for Kubernetes/HPC environments
Preferred Experience
•Experience delivering HPC or AI/ML workloads across large-scale, low-latency network environments
•Experience collaborating with vendors and influencing product direction
•Contributions to open-source HPC or networking projects
•Bachelor’s or Master’s degree in Computer Science, Networking, Engineering, or related field
•Certifications such as Cisco CCNP/CCIE, Juniper JNCIP, AWS Advanced Networking Specialty, or Red Hat RHCE
About the Company
G