Data Engineer(Python+ML Exp Must)_Los Altos, CA (Hybrid/Remote)

Agile Global Solutions, Inc.

Los Altos, CA(remote)

Apply
JOB DETAILS
JOB TYPE
Full-time, Employee
SKILLS
Agile Programming Methodologies, Amazon Elastic Compute Cloud (EC2), Amazon Simple Storage Service (S3), Amazon Web Services (AWS), Apache Spark, Artificial Intelligence (AI), Auditing, Automotive Automation, Best Practices, Computer Science, Continuous Deployment/Delivery, Continuous Integration, Data Management, Data Quality, Data Sets, Distributed Computing, Documentation Standards, GPS (Global Positioning System), Human Interaction, Light Detection and Ranging (LiDAR)\Laser Detection and Ranging (LADAR), Machine Learning, Metadata, Performance Analysis, Power Amplifier, Python Programming/Scripting Language, Quality Management, Reinforcement Learning, Robotics, Scalable System Development, Scientific Research, Simulation, Software Development, Software Engineering, Software Simulation, Technical Recruiting, User Interface/Experience (UI/UX), Workflow Analysis
LOCATION
Los Altos, CA
POSTED
1 day ago

Position: Data Engineer – Autonomous Vehicle AI Research Infrastructure

Location: Los Altos, CA 

Duration: Contract

 

Job Description:

At COMPANY we’re on a mission to improve the quality of human
life. We’re developing new tools and capabilities to amplify the human experience. To lead this transformative shift in mobility, we’ve built a world-class team in Energy & Materials, Human-Centered AI, Human Interactive Driving, Large Behavioral Models, and Robotics.
Within the Human Interactive Driving division, the Extreme Performance Intelligent Control department is working to develop scalable, human-like driving intelligence by learning from expert human drivers. This project focuses on creating a configurable, data-driven world model that serves as a foundation for intelligent, multi-agent reasoning in dynamic driving environments. By tightly integrating advances in perception, world modeling, and model-based reinforcement learning, we aim to
overcome the limitations of more compartmentalized, rule-based approaches. The end goal is to enable robust, adaptable, and interpretable driving policies that generalize across tasks, sensor modalities, and public road scenarios—delivering transformative
improvements for ADAS, autonomous systems, and simulation-driven software development.
As a Data Engineer, you will be a key enabler of this mission—owning the systems that collect, organize, clean, and deliver the volumes of sensor and simulation data that fuel our world models, perception systems, and reinforcement learning algorithms. You will collaborate closely with research scientists and machine learning engineers to ensure our pipelines are reliable, scalable, and performant—powering breakthroughs in intelligent driving across simulation and real-world deployments.


Responsibilities
● Design, implement, and maintain robust data pipelines for ingesting, cleaning, and transforming large-scale autonomous vehicle datasets (camera, LiDAR, radar, GPS, simulation logs).
● Develop scalable storage and retrieval systems using AWS services (S3, EC2, SageMaker, Athena, etc.).
● Ensure data quality and consistency through automated validation, deduplication, and schema enforcement.
● Collaborate with ML researchers and engineers to provide efficient access to training data, labels, and metadata.
● Optimize data preprocessing and batching pipelines to support large-scale training and evaluation workflows.
● Build tools to manage and audit dataset versions, experiment tracking, and feature reproducibility.
● Implement and maintain CI/CD workflows for data and pipeline updates, ensuring minimal downtime and reproducible outputs.
● Monitor data pipeline performance and respond to bottlenecks or outages proactively.


Qualifications
● B.S. or M.S. in Computer Science, Data Engineering, or a related field.
● 3+ years of experience building production-grade data infrastructure or ML data
pipelines.
● Strong proficiency with Python and SQL, and experience with data workflow
orchestration tools (e.g., Airflow, Prefect, Luigi).
● Deep experience with AWS services, especially S3 (data storage), EC2
(compute), and SageMaker (model training).
● Familiarity with distributed computing frameworks like Spark, Dask, or Ray.
● Understanding of best practices for dataset documentation, standardization, and
reproducibility in research.
Bonus Qualifications
● Experience with autonomous vehicle datasets or robotics sensor data.
● Familiarity with ML training pipelines and model evaluation workflows.
● Prior experience collaborating with researchers or applied ML teams in
high-throughput environments.

 

 

Best Regards,

 

T Chandra Sekhar - Technical Sr. Recruiter

Agile Global Solutions, Inc ....."Empowering Enterprises"

193 Blue Ravine Road, Suite 160, Folsom, CA 95630

Direct - 916-413-7282

Sekhar@agileglobalsolutions.com | www.agileglobal.com

About the Company

A

Agile Global Solutions, Inc.