We're planting a tree for every job application! Click here to learn more

Software Engineer, Data


Remote within United States

Posted 3 months ago

Data-focused Software Engineering role at AI Research Lab, Imbue

Tech stack

  • Python


Imbue believe that high-quality data is the most important part of creating high-performance machine learning systems, regardless of whether they are simple classifiers or state-of-the-art reasoning agents. Unlike many other organizations, they view this work and this role as one of the most important at the company.

In this role, you will work on the most important part of Imbue's system: the software infrastructure for collecting, preprocessing, generating, analyzing, and distilling the wide variety of data sources that go into both their primary pretraining data corpus, as well as the datasets for all of the other ancillary and secondary models and system. You will make a meaningful, measurable impact on the performance of Imbue's systems, and experience the joy of spending time to make high-quality software that makes high-quality data.

Example projects:

  • Incorporate new sources of high-quality text data into our existing data pipelines
  • Develop models for accurately classifying and extracting meaningful text from raw HTML
  • Create a high-quality OCR pipeline for pulling pretraining text from images and scans
  • Collect a ludicrous amount of multimodal data(ex: transcripts for thousands of years of video)
  • Design unique data generation pipelines that leverage existing data(ex: convert code from one language to another)
  • Integrate multiple annotation service providers into a sensible interface for researchers

You are:

  • Detail oriented. Data mistakes are easy to make and hard to catch.
  • Passionate about data. You should be happy to look at and deeply engage with the raw data.
  • An excellent software engineer. We care about engineering best practices.
  • Familiar with Python.


  • Work on the most important part of Imbue's system
  • Work at a place that deeply cares about data quality
  • Work directly on creating software with human-like intelligence
  • Very generous compensation
  • Flexible working hours
  • Work remotely
  • Time and budget for learning and self-improvement

About the company:

Imbue builds AI systems that reason and code, enabling AI agents to accomplish larger goals and safely work in the real world. Imbue trains its own foundation models optimized for reasoning and prototype agents on top of these models. By using these agents extensively, they can gain insights into improving both the capabilities of the underlying models and the interaction design for agents.

Imbue aims to rekindle the dream of the *personal* computer, where computers become truly intelligent tools that empower us, giving us freedom, dignity, and agency to pursue the things we love.

WorksHub are not directly affiliated/are not a direct part of Imbue

What makes you a perfect
candidate for this role

  • An academic degree in the relevant field is good to have

  • 4+

    years of commercial experience
  • Corresponding level of skills:





  • Language skills:





Role type

Full time

Visa sponsorship

Not provided

Benefits & perks

  • Flexible Working

  • Conferences

  • Equity

  • Supportive culture

  • Team events

  • Learning budget

Similar roles that might interest you


CareersCompaniesSitemapFunctional WorksBlockchain WorksJavaScript WorksAI WorksGolang WorksJava WorksPython WorksRemote Works

Ground Floor, Verse Building, 18 Brunswick Place, London, N1 6DZ

108 E 16th Street, New York, NY 10003

Subscribe to our newsletter

Join over 111,000 others and get access to exclusive content, job opportunities and more!

© 2024 WorksHub

Privacy PolicyDeveloped by WorksHub