Towards Scaling Embodied Data for Robot Learning

December 2025

Towards Scaling Embodied Data for Robot Learning

Authors:

Tony (long) Tao

Abstract:

As artificial intelligence advances quickly in the digital domain, the next frontier lies in physical intelligence: systems that learn through acting and sensing in the real world. In this thesis, we explore practical ways of scaling such embodied data across three directions. AnyCar scales synthetic data through large-scale simulation, training a universal dynamics transformer that generalizes across vehicles and environments. FACTR improves the efficiency of real robot data with a low-cost bilateral teleoperation system and a curriculum that teaches policies to integrate force and vision. DexWild scales human data through in-the-wild data collection and co-training with robot demonstrations, enabling generalization to unseen objects and environments. Together, these projects explore how a data-centric approach can enable more adaptive and capable robots.

Notes:

@mastersthesis{Tao-2025-149669,
author = {Tony (long) Tao},
title = {Towards Scaling Embodied Data for Robot Learning},
year = {2025},
month = {December},
school = {Carnegie Mellon University},
address = {Pittsburgh, PA},
number = {CMU-RI-TR-25-102},
}
Copyright notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.