We're planting a tree for every job application! Click here to learn more

Head of Data Engineering


San Francisco, CA, US

Posted 3 months ago

Lead data engineering efforts at AI Research Lab, Imbue

Tech stack

  • Data Engineering

This is an on-site position


Imbue believe that high-quality data is the most important part of creating high-performance machine learning systems, regardless of whether they are simple classifiers or state-of-the-art reasoning agents. This work is viewed as one of the most important at the company so they want someone who is solely dedicated to coordinating their efforts across the diverse range of highly important data.

In this role, you will lead Imbue's data engineering efforts. You will coordinate everything from human data collection processes to the collection, filtering, and preprocessing of raw web data, longer texts, code, and other generated data. You will be responsible for the ultimate quality and quantity of data on which Imbue can train its systems, which is the primary factor in their performance. You will both direct this work and get into the details yourself to ensure that data quality is constantly being improved.

Example projects:

  • Scan one million physical books and convert them into high-quality pretraining data.
  • Find 90% of the most useful text available online and make clean training data from it.
  • Generate pretraining data in ways that are guaranteed to have low error.
  • Measure and understand the quality of each of our datasets.
  • Ensure that researchers and engineers can quickly and easily acquire human labels for a dataset.

You are:

  • Passionate about data. You should be happy to look at and deeply engage with the raw data.
  • An excellent software engineer. We care about engineering best practices.
  • A great communicator. You will need to coordinate efforts between multiple external organizations and within our own team.
  • Familiar with Python.


  • Work on the most important part of Imbue's system
  • Work at a place that deeply cares about data quality
  • Work directly on creating software with human-like intelligence.
  • Generous compensation, equity, and benefits.
  • $20K+ yearly budget for self-improvement: coaching, courses, conferences, etc.
  • Actively co-create and participate in a positive, intentional team culture.
  • Spend time learning, reading papers, and deeply understanding prior work.
  • Frequent team events, dinners, off-sites, and hanging out.

About the company:

Imbue builds AI systems that reason and code, enabling AI agents to accomplish larger goals and safely work in the real world. Imbue trains its own foundation models optimized for reasoning and prototype agents on top of these models. By using these agents extensively, they can gain insights into improving both the capabilities of the underlying models and the interaction design for agents.

Imbue aims to rekindle the dream of the *personal* computer, where computers become truly intelligent tools that empower us, giving us freedom, dignity, and agency to pursue the things we love.

WorksHub are not directly affiliated/are not a direct part of Imbue

What makes you a perfect
candidate for this role

  • An academic degree in the relevant field is good to have

  • 7+

    years of commercial experience
  • Corresponding level of skills:



  • Language skills:





Role type

Full time

Visa sponsorship


Benefits & perks

  • Flexible Working

  • Conferences

  • Equity

  • Supportive culture

  • Team events

  • Learning budget

Similar roles that might interest you


CareersCompaniesSitemapFunctional WorksBlockchain WorksJavaScript WorksAI WorksGolang WorksJava WorksPython WorksRemote Works

Ground Floor, Verse Building, 18 Brunswick Place, London, N1 6DZ

108 E 16th Street, New York, NY 10003

Subscribe to our newsletter

Join over 111,000 others and get access to exclusive content, job opportunities and more!

© 2024 WorksHub

Privacy PolicyDeveloped by WorksHub