HPC Infrastructure Operations Lead
Location: Chicago or New York (On-site 5 days/week; regular travel to HPC data center sites required)
Jump Trading Group is committed to world class research. We empower exceptional talents in Mathematics, Physics, and Computer Science to seek scientific boundaries, push through them, and apply cutting edge research to global financial markets. Our culture is unique. Constant innovation requires fearlessness, creativity, intellectual honesty, and a relentless competitive streak. We believe in winning together and unlocking unique individual talent by incenting collaboration and mutual respect. At Jump, research outcomes drive more than superior risk adjusted returns. We design, develop, and deploy technologies that change our world, fund start-ups across industries, and partner with leading global research organizations and universities to solve problems.
Trading Infrastructure is a global organization of Engineers who architect, build and maintain our world-class infrastructure. From colo design/implementation, to optimizing our exchange connectivity, to building world class low latent Wide Area Networks, we leverage research and automation to consistently adapt and innovate our infrastructure to scale and drive our trading and evolving business.
Jump's HPC infrastructure powers some of the most demanding computational workloads in the industry. As our HPC footprint grows, we need a seasoned operations leader to own the reliability, standards, and day-to-day excellence of these environments. This role leads the teams that keep the lights on across Jump's HPC data centers, ensuring maximum uptime through disciplined operations, proactive maintenance, and deep technical expertise in critical facility systems. Heavy, daily use of AI tools is expected in this role-to accelerate decision-making, automate operational workflows, analyze data center telemetry, and continuously raise the bar on how the team operates.
What You'll Do:
Team Leadership & Organizational Ownership
HPC Data Center Standards, Processes & Preventative Maintenance
Critical Facility Systems Expertise
Monitoring & Incident Response
Server & Switch Hardware Expertise
Hardware Break-Fix
Inventory & Spares Management
Planning, Vendor & Budget Management
Networking & Linux
AI-Driven Operations
Cross-Team Partnership
Travel
Additional duties as assigned or needed.
Skills You'll Need:
Technical Skills: