Skip to content

AI Research Engineer - World Models

    Job description

    AI Research Engineer - World Model

    About 1X

    We’re building humanoid robots that work in home - doing the chores, handling the tasks, and giving people their time back. Simple, but it’s not. 

    To do this right, we have to solve robotics, AI, manufacturing - at the same time, at scale, in a form factor that has to be safe enough to live with your family. If you’re inspired by this, you’ll thrive here. We’ve been at this since 2014 and we’re at the point where the hard problems are behind us and the hard work is in front of us. 

    NEO is our flagship - a home robot designed to move, learn, and operate in the real world alongside real people. We’re not demoing it - we’re shipping it. We’re excited to meet you, if this excites you. 

    If you’ve spent your career working on problems that matter and want to see them actually reach the world - this is that moment. We’re scaling, we’re hiring with intention, and we need people who want to build something that will genuinely change how humans spend their time - safely creating abundance for all. 

    About the Team

    The 1X World Model Lab is an embodied AI research organization focused on pretraining the foundation models to accelerate the emergence of embodied intelligence.

    The lab is founded on a simple thesis: robotics is not a fine-tuning problem. To build truly general humanoids, we need to pretrain on the most important data from the very beginning.

    Your Charter

    Advance the world model architectures and training systems that are the foundation of NEO's autonomy; models that learn from the full multi-modal stream of robot experience (video, proprioception, audio, actions) to predict what happens next and what the robot should do. This is critical-path research with direct product impact: improvements in pre-training metrics translate directly to better real-world task performance, and the models you build will run on hardware in homes and warehouses. You will own the full stack from data pipeline to model architecture to deployment, combining frontier research with hands-on engineering.

    Key Outcomes

    • Advance world model architectures that deliver measurable gains in real-world robot autonomy driven by pre-training improvements in log loss, perplexity, and downstream task benchmarks

    • Build high-throughput data pipelines and tokenizers for large-scale multi-modal robot datasets that keep training compute fully utilized and ensure data quality is the primary constraint on model improvement

    • Establish and own the evaluation infrastructure that connects pre-training metrics to real-world robot performance, enabling the team to iterate on architectures with confidence that lab results predict field outcomes

    • Ship models from research to production: own the full path from architecture experiments through optimized deployment on robot hardware, ensuring research improvements reach the robot

    Key Competencies

    • Deep generative modeling expertise training and evaluating large-scale multi-modal generative models (video, VLMs, or similar); understands transformer architectures at the level needed to innovate on them, not just apply them

    • Scaling law intuition understanding how model capability scales with compute and data; uses pre-training metrics to predict downstream performance and make principled decisions about where to invest training budget

    • Full-stack researcher building their own data pipelines, training infrastructure, and evaluation frameworks; doesn't wait for someone else to prepare the data or run the experiments

    • Research-to-product mindset doing frontier research and ship it; translates experimental results into production systems, and measures success by impact on real robot behavior, not just benchmark numbers

    Job requirements

    Minimum Requirements

    • Strong Python and PyTorch (or equivalent deep learning framework); comfortable with large-scale training infrastructure and tooling (Bazel, Hydra, or equivalent)

    • Demonstrated experience training and evaluating large-scale multi-modal generative models: video generation, vision-language models, or comparable work with continuous and discrete modalities

    • Ability to design and optimize large-scale data pipelines for model training, including high-throughput data loading and preprocessing for multi-modal datasets

    • Strong understanding of scaling laws and the relationship between pre-training metrics and downstream task performance

    Preferred Skills

    • Experience with transformer architectures for video or action prediction: diffusion transformers, masked autoencoders, autoregressive next-token prediction, or flow matching over continuous observations

    • Familiarity with tokenization strategies for continuous modalities: VQ-VAE, FSQ, continuous tokenization, or hybrid discrete-continuous representations

    • Experience with simulation platforms (Isaac Sim, MuJoCo) for generating training data, evaluating model predictions, or closing the sim-to-real gap

    • Background in embodied AI, robotics, or physical systems where model predictions must translate reliably to real-world actions

    Benefits & Compensation

    • Salary Range: $180,000 - $300,000 + Equity

    • Health, dental, and vision insurance

    • 401(k) with company match

    • Paid time off and holidays

    Equal Opportunity Employer

    1X is an Equal Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, gender, gender identity or expression, sexual orientation, national origin, ancestry, citizenship, age, marital status, medical condition, genetic information, disability, military or veteran status, or any other characteristic protected under applicable federal, state, or local law.

    or

    On-site
    • San Carlos, California, United States
    $180,000 - $300,000 per year
    AI Research - 1X World Model Lab