AI Infrastructure4 min read

XDOF raises $70M to build robot training data pipelines

XDOF emerged from stealth with $70M to supply teleoperation, annotation and the ABC dataset (130,000 trajectories) to AI labs.

The Brieftide

TL;DR

  • 01XDOF emerged from stealth with $70M to supply teleoperation, annotation and the ABC dataset (130,000 trajectories) to AI labs.
  • 02The company says it has about 60 employees and is already working with 20 customers, though it would not name them.
  • 03XDOF announced a $70 million funding round and a plan to become a one-stop data provider for robot training, starting by partnering with UC Berkeley to release a large public dataset called ABC.

XDOF emerged from stealth today, raising $70 million from Thrive Capital, Spark Capital, a16z, Lux and WndrCo to build the data pipelines, collection tools and annotation systems that robotics-focused AI labs lack. The company says it has about 60 employees and is already working with 20 customers, though it would not name them.

What did XDOF announce?

XDOF announced a $70 million funding round and a plan to become a one-stop data provider for robot training, starting by partnering with UC Berkeley to release a large public dataset called ABC. The ABC release includes 130,000 trajectories of robot manipulation data, 300 hours of simulation and 100 hours of evaluations, and the company says it has used those data to train robots on tasks such as folding T-shirts, flattening boxes and loading AirPods into cases.

The company was founded in October 2024 by Philipp Wu, Fred Shentu and Nemo Jin. Wu described the core problem that motivated XDOF during his PhD work at UC Berkeley: "We didn’t have large-scale data to work with." XDOF aims to sell not just raw data but the tooling, cleaning and annotation that make that data usable at scale.

How does XDOF collect and organize robot data?

XDOF organizes data into a three-tier pyramid: the most valuable tier is teleoperation data collected on the actual robot being deployed, the second tier is teleoperated robots gathering more general data (as with GELLO), and the third tier is egocentric data gathered by humans wearing sensors. The company plans to build teleoperation teams and wearable sensor systems to populate each tier and to run the downstream cleaning and annotation work that frontier labs lack.

The company traces its technical roots to GELLO, a low-cost teleoperation system that Wu and Shentu developed while at Berkeley to let human operators control robotic arms to generate training data. XDOF says its scope goes beyond collection. It plans to provide cleaning, tooling and annotation to create a self-reinforcing feedback loop for robot trainers and to supply scaled pre-training data to both industry and academia.

Why does this matter?

Robots need high-fidelity physical-interaction data that public text and video cannot provide, and that kind of data barely exists today. Unlike large language models, which could be trained on vast amounts of public text, the robotics field faces a chicken-and-egg problem: you need large datasets to train foundation robotics models, but collecting that data requires infrastructure and operational scale.

XDOF positions itself where many AI labs may not want to invest: running warehouses of robots, maintaining and calibrating hardware, and hiring and training teleoperators. Wu points out the scale required, saying you need a warehouse of hundreds of thousands of square feet with hundreds of robots, plus the labor to run them. The funding and UC Berkeley partnership aim to lower the barrier for labs seeking physical-world training data.

What to watch

Watch whether the ABC dataset spurs wider academic and industry adoption and whether XDOF discloses its frontier-lab customers. Also monitor whether XDOF scales the operational footprint it describes: hiring global teams of teleoperators, deploying wearable egocentric sensors and standing up the warehouses and robot fleets it says are necessary for large-scale data production.

Sources: XDOF founders and spokespeople, UC Berkeley collaboration and the ABC dataset details as described by the company in its launch announcement.

XDOF robot data pipeline (conceptual)
raw capture -> cleaninggeneralized captures -> toolingegocentric data -> annotationcurated outputs -> ABC datasetreleased / licensed to customersTeleoperation on target robotMost valuable tier: deployment-specific dataTeleoperated robots (GELLO)General manipulation dataEgocentric wearable sensorsHuman task demonstrationsData cleaning, tooling, annotationTransforms raw captures into training-ready dataABC dataset130,000 trajectories; 300h sim; 100h evalFrontier AI labs and robotics companiesCustomers (20 reported)
Advertisement

Written by The Brieftide · Source: TechCrunch

The Brieftide Daily · 06:00

Briefs like this one, in your inbox every morning.

 

FreeOne email a dayEvery claim sourcedUnsubscribe in one click
Advertisement