Principal Data Engineer

Sanas.ai

Palo Alto, CA

JOB DETAILS
SKILLS
Algorithms, Amazon Simple Storage Service (S3), Apache Kafka, Apache Spark, Artificial Intelligence (AI), Best Practices, Cloud Computing, Code Reviews, Cross-Functional, Data Management, Data Modeling, Data Processing, Data Quality, Disaster Recovery, Entrepreneurship, Finance, Funding, Legal, Machine Learning, Machine Tool, Mentoring, Metadata, PostgreSQL, Privacy Controls, Product Marketing, Regulatory Requirements, Scientific Research, Security Compliance, Snowflake Schema, Speaker Verification, Startup, Systems Scalability, Team Player, Technical Leadership, Technical Strategy, Telemetry, Test Tools, Unicorn Library Management System, Vehicle Fleets, Voice Products
LOCATION
Palo Alto, CA
POSTED
30+ days ago

Sanas.ai is pioneering the future of human communication. Founded by a team of Stanford researchers and entrepreneurs with deep industry experience Sanas has developed the worlds first real-time speech transformation platform capable of accent translation noise elimination speech enhancement and cross-language communication.

Sanas makes conversations clearer more inclusive and more effective removing barriers that prevent people from being understood regardless of accent background noise or native language.

Since going to market in 2023 Sanas has scaled at an extraordinary pace growing from 0 to 32M ARR in under two years with a projected >50M ARR by the end of 2025. The company recently recorded its first 10M quarter and is on track to achieve 120M in ARR next year. The companys valuation has a clear trajectory toward multi-billion-dollar market capitalization as it continues to expand into new verticals and product categories. With a TAM that spans all human in the loop communications and beyond Sanas has the potential to impact every industry and every global interaction.

Sanas is revolutionizing the way we communicate with the worlds first real-time algorithm designed to modulate accents eliminate background noises and magnify speech clarity. Pioneered by seasoned startup founders with a proven track record of creating and steering multiple unicorn companies our groundbreaking GDP-shifting technology sets a gold standard.

Sanas is a 200-strong team established in 2020. In this short span weve successfully secured over 100 million in funding. Our innovation has been supported by the industrys leading investors including Insight Partners Google Ventures Quadrille Capital General Catalyst Quiet Capital and other influential investors. Our reputation is further solidified by collaborations with numerous Fortune 100 companies. With Sanas youre not just adopting a product youre investing in the future of communication.

We're looking for an experienced and forward-thinking Principal Data Engineer to lead the design and implementation of our end-to-end data infrastructure for industry leading Voice AI products. This is a high impact role where you will shape the technical vision own strategic architecture decisions and mentor a growing team of Data engineers focused on delivering reliable and scalable data systems for Machine Learning at scale.

You'll work cross-functionally with AI research scientists Infrastructure and product teams to ensure that data - from raw audio to training-ready features - is consistently accessible compliant and optimized for speed and scale. You'll help push the boundaries of real-time Voice AI

Key Responsibilities

Architect and lead the development of large scale data pipelines and data lakes to ingest transform and serve high quality data for AI model training product telemetry and analytics.

Drive long-term data infrastructure strategy across streaming and batch feature store extensions IcebergDelta lake choices metadata management and lakehouse evolution.

Drive platform and infrastructure decisions optimizing compute fleets e.g.Ray Spark clusters orchestration tooling Airflow Dagster and streaming stacks Kafka Flink

Collaborate with AI research scientists engineering leads product finance marketing and legal to align data architecture with business and regulatory requirements.

Advocate best practices in data governance lineage observability testing tooling and disaster recovery across pipelines and data stores.

Act as a mentor and technical leader - review design and code share patterns elevate team capability and support recruitment and hiring

Drive build vs buy decisions for tools to implement data quality and observability solutions to achieve high data quality.

Qualifications

10 years of experience in Data Engineering Infrastructure or ML Systems with at least 2 years in a technical leadership capacity.

Expertise in building distributed batch and real-time data systems

Expertise in Databases like Postgres andData Lakes like Snowflake Databricks and ClickHouse

Experience using Data Processing frameworks like Spark Flink and RayDeep

Experience with cloud platforms AWSGCP object storage e.g. S3 and orchestrators like Airflow and Dagster

Strong knowledge of data lifecycle management including privacy security compliance and reproducibility

Comfortable working in a fast-paced startup environment

Strategic mindset and proven ability to collaborate across engineering ML and product teams to deliver infrastructure that scales with the business.

Nice to Have

Familiarity with audio data and its unique challenges like large file sizes time- series features metadata handling is a strong plus

Experience with Voice AI models like ASR TTS and speaker verification.

Familiarity with real-time data processing frameworks like Kafka Flink Druid and Pinot

Familiarity with ML workflows including MLOps feature engineering model training and inference.

Experience with labeling tools audio annotation platforms or human-in-the- loop annotation pipelines.

About the Company

S

Sanas.ai