Member of Technical Staff (MTS) - Multimodal Foundation Models

Deeproute.ai

Fremont, CA

Apply

JOB DETAILS

SKILLS

Analysis Skills, Artificial Intelligence (AI), Autonomous Driving Systems, Computer Science, Computer Vision, Distributed Computing, Engineering, Failure Analysis, Large-Scale Systems, Machine Learning, Memory Hardware, Modeling Languages, Production Systems, Robotics, Scalable System Development, Systems Scalability, Technical Research

LOCATION

Fremont, CA

POSTED

29 days ago

Focus

Multimodal Foundation Models · Representation Learning · Method Innovation

We are looking for strong technical builders and researchers who deeply understand foundation models and representation learning beyond simply applying existing frameworks.

Ideal candidates should have:

Strong experimental rigor
Solid systems and modeling intuition
Hands-on engineering ability
Interest in scalable multimodal AI systems for real-world autonomy

We value people who can bridge research and production, and who care about robustness, scalability, efficiency, and practical deployment in large-scale autonomous driving systems.

Responsibilities

1. Large-Scale Foundation Model Pretraining

Develop scalable pretraining pipelines for large-scale multimodal driving data
Design and optimize training strategies for:
- - Vision-language-action models
  - Video foundation models
  - Long-context temporal modeling
  - Multimodal representation alignment
Improve:
- Training stability
- Data efficiency
- Scaling efficiency
- Representation robustness
Work on distributed training systems and large-scale model optimization using frameworks such as:
- PyTorch Distributed
- DeepSpeed
- Megatron-LM

2. Representation Learning & Method Innovation

Design and improve self-supervised and multimodal learning methods for real-world autonomous driving systems
Conduct architecture-level research on:
- Vision Transformers (ViT)
- Video / temporal architectures
- Multimodal fusion and alignment
- Embedding and retrieval systems
- Long-context and memory-efficient architectures
Explore and improve:
- Pretraining objectives
- Loss functions
- Training paradigms
- Generalization and robustness
Analyze model behavior through:
- Rigorous ablation studies
- Failure case analysis
Representation probing and evaluation

3. Efficient Foundation Models & Scalable Deployment

Improve the efficiency, scalability, and deployability of large multimodal foundation models for real-world autonomous driving systems
Work on areas such as:
- Model quantization
- Knowledge distillation
- Efficient attention mechanisms
- Sparse architectures and Mixture-of-Experts (MoE)
- Long-context and memory-efficient modeling
- Inference acceleration and serving optimization
- Training and inference system efficiency
Optimize model throughput, latency, memory usage, and deployment performance for large-scale production environments

Requirements

MS or PhD in:
- - Computer Vision
  - Machine Learning
  - Robotics
  - Computer Science
  - Related fields
Strong understanding of:
- - Foundation models
  - Self-supervised learning
  - Representation learning
  - Multimodal learning
  - Large-scale pretraining
Hands-on experience with methods such as:
- - CLIP
  - DINO / DINOv2
  - MAE
  - Contrastive learning
  - Masked modeling
  - MoE or scalable transformer architectures
Experience with one or more of the following is highly valued:
- - Video foundation models
  - Long-context modeling
  - Retrieval systems
  - Efficient inference
  - Distributed training
  - Model compression and deployment optimization
Strong publication record in top-tier venues is preferred:
- - CVPR
  - ICCV
  - ECCV
  - NeurIPS
  - ICLR
  - ICML

Member of Technical Staff (MTS) - Multimodal Foundation Models

Deeproute.ai

Fremont, CA

About the Company

Deeproute.ai

Similar Job Searches