Research Engineer, Companion

Job description

About the Role

NEO is a home robot that handles chores and provides personalized assistance. It's meant to be controlled naturally: you talk, it understands, it responds, it acts.

As a Research Engineer, you'll own the speech and language stack that makes NEO feel like a calm, capable presence: streaming ASR, low-latency TTS, speech-to-speech interaction, and task-level NLU, all tuned for robotics constraints (latency, noise, far-field audio, on-device budgets, safety, and reliability).

Job requirements

In this role, you will:

Build and deploy production speech models (ASR, TTS, and/or Speech to Speech) optimized for NEO's on-device compute and real-time latency requirements.
Own the full pipeline from data collection and model training through deployment and monitoring.
Solve hard acoustic problems that come with putting a robot in someone's living room: far-field recognition, noise robustness, barge-in handling, and multi-speaker environments.
Design voice interaction that feels human: natural prosody, appropriate turn-taking, responses that match context and emotional tone.
Build evaluation infrastructure, define quality metrics that actually correlate with user experience, and use real-world feedback to prioritize improvements.
Collaborate with hardware and robotics teams to integrate speech capabilities with NEO's vision, memory, and physical behavior systems.

You might thrive in this role if you:

Have 3+ years building speech systems, especially those currently running in production environments.
Have deep expertise in at least one of: automatic speech recognition, text-to-speech synthesis, spoken language understanding, or speech-to-speech modeling.
Are fluent in modern deep learning (transformers, diffusion models, autoregressive generation) and can train, debug, and optimize models end-to-end in PyTorch or JAX.
Have deployed models under real constraints: on-device inference, latency budgets, memory limits, or edge hardware.
Have published at top venues (ICASSP, Interspeech, NeurIPS, ICML) or built equivalent systems at companies known for voice AI.

On-site

San Carlos, California, United States

$150,000 - $250,000 per year

Artificial Intelligence (AI)