Analysis Skills, Budgeting, Communication Skills, Deep Learning, English Language, Establish Priorities, Machine Translation, Mentoring, Metrics, Multilingual, Natural Language Processing (NLP), Process Improvement, Production Control, Production Systems, Python Programming/Scripting Language, Quality Metrics, Quality Monitoring, Research & Development (R&D), Technical Leadership, Technical Research, Use Cases
LOCATION
Palo Alto, California
POSTED
30+ days ago
About the Role
Language Translation is one of Sanas's most exciting and fastest-growing product lines. We're looking for a Research Engineer who can both set technical direction and get deep in the modeling work — someone who owns translation quality end-to-end across language pairs and drives the fundamental research challenges unique to real-time simultaneous interpretation.
Job Description
Translation quality & modeling
Own and drive improvements to translation accuracy across Sanas's supported language pairs, with a focus on conversational, spoken-language domains.
Design, train, and evaluate neural MT models — from fine-tuning large multilingual models to building targeted components for low-resource or high-priority language pairs.
Develop and maintain rigorous evaluation pipelines using both automated metrics (BLEU, COMET, chrF) and human evaluation frameworks calibrated to real-world enterprise use cases.
Identify the highest-leverage research bets — data augmentation, domain adaptation, quality estimation, terminology consistency — and execute on them with measurable quality gains.
Simultaneous interpretation & delimiter modeling
Lead research and development of Sanas's delimiter model — the component that determines optimal segmentation points in streaming speech for real-time translation output.
Develop methods to handle speech disfluencies, sentence fragments, and incomplete utterances gracefully in a streaming translation pipeline.
Collaborate closely with the speech and inference engineering teams to ensure translation components meet strict real-time latency budgets in production.
Research direction & technical leadership
Define and maintain a research roadmap for MT and simultaneous interpretation, prioritizing work that moves production quality metrics.
Stay at the frontier of MT research — track and evaluate relevant work — and translate (haha) relevant advances into practical improvements at Sanas.
Mentor and technically guide other engineers working on translation-adjacent problems across the ML org.
Data & infrastructure
Identify, source, and curate training data for MT and delimiter modeling — including parallel corpora, synthetic data generation, and speech-aware augmentation strategies.
Instrument model quality monitoring in production to detect degradation across language pairs and trigger targeted retraining cycles.
Qualifications
3+ years of experience in machine translation, NLP, or multilingual modeling research — with a track record of measurable quality improvements in production systems.
Deep familiarity with neural MT architectures: sequence-to-sequence models, Transformer variants, and large multilingual models.
Hands-on experience with simultaneous or streaming translation, including segmentation and low-latency decoding strategies.
Strong command of MT evaluation methodology — automated metrics, human evaluation design, and error analysis.
Proficiency in Python and deep learning frameworks (PyTorch preferred)
Demonstrated ability to set a research agenda, execute independently, and communicate findings clearly to technical and non-technical stakeholders.
Fluency in English plus working proficiency in at least one non-English language is a strong plus.
Bonus
Experience with speech translation (end-to-end or cascaded) and speech-aware MT pipelines.
Familiarity with on-device or edge-optimized model deployment for low-latency inference.
Prior work on low-resource language pairs, domain adaptation, or terminology-constrained translation.
Published research at ACL, EMNLP, NAACL, INTERSPEECH, or equivalent venues.