April
2026
Scalable Sim-to-Real Learning for General-Purpose Humanoid Skills
Authors:
Abstract:
Humanoid robots are compelling because they inherit the physical interface of the human world. Their body plan is naturally compatible with stairs, doors, tools, shelves, workstations, and the many spaces designed around human reach, height, and dexterity. Yet this same embodiment makes humanoids extraordinarily demanding to control: locomotion, balance, contact, manipulation, perception, and hardware reliability are tightly coupled, and progress in one capability does not automatically translate into a useful overall system. This dissertation addresses that challenge through the unifying theme of scalable sim-to-real learning for general-purpose humanoid skills.
The central claim of the thesis is that general-purpose humanoid capability does not emerge from scaling a single ingredient in isolation. It emerges when four ingredients are developed as one integrated stack: rich whole-body motor priors derived from human guidance, reusable control abstractions that compress diverse behaviors, explicit mechanisms for sim-to-real alignment, and perceptive policies that fuse seeing, moving, and interacting on hardware. Under this view, the problem is not only how to make a humanoid solve a particular task, but how to build a training and deployment pipeline through which new whole-body skills can be acquired, reused, and transferred reliably.
The dissertation develops this argument in three parts. Part I, Sim-to-Real Whole-Body Humanoid Control, establishes the motor foundation. H2O shows that learning-based real-time whole-body teleoperation of a full-sized humanoid is feasible with accessible human sensing. OmniH2O extends that result to dexterous loco-manipulation and universal teleoperation interfaces, creating a richer bridge between human motion and robot behavior. HOVER then unifies multiple control modes inside a single generalist whole-body controller, and ASAP closes the dynamics gap through residual delta-action alignment, enabling agile behaviors learned in simulation to survive deployment on real hardware.
Part II, Sim-to-Real Perceptive Navigation and Mobility, brings exteroception into the control loop. ABS demonstrates that safe movement in unstructured environments requires perception-aware mobility rather than blind control.
Part III, Sim-to-Real Perceptive Loco-Manipulation, extends this principle from mobility to long-horizon interaction. VIRAL shows that large-scale visual sim-to-real training can produce RGB-based humanoid policies for navigation, grasping, transport, and placement using onboard sensing alone. DoorMan then demonstrates that the same philosophy extends to articulated interaction, where pixel-to-action policies must coordinate perception, balance, manipulation, and long-horizon contact-rich sequencing to open diverse real-world doors.
Taken together, these chapters define a coherent dissertation-level contribution: a scalable recipe for general-purpose humanoid skills in which human motion supplies priors, simulation supplies scale, control abstractions supply reuse, alignment mechanisms preserve transfer, and perception turns whole-body competence into embodied interaction. The thesis argues that the path toward useful humanoids is cumulative rather than monolithic. General-purpose capability becomes plausible when skill acquisition, policy design, and deployment are treated as one continuous sim-to-real systems problem.
The central claim of the thesis is that general-purpose humanoid capability does not emerge from scaling a single ingredient in isolation. It emerges when four ingredients are developed as one integrated stack: rich whole-body motor priors derived from human guidance, reusable control abstractions that compress diverse behaviors, explicit mechanisms for sim-to-real alignment, and perceptive policies that fuse seeing, moving, and interacting on hardware. Under this view, the problem is not only how to make a humanoid solve a particular task, but how to build a training and deployment pipeline through which new whole-body skills can be acquired, reused, and transferred reliably.
The dissertation develops this argument in three parts. Part I, Sim-to-Real Whole-Body Humanoid Control, establishes the motor foundation. H2O shows that learning-based real-time whole-body teleoperation of a full-sized humanoid is feasible with accessible human sensing. OmniH2O extends that result to dexterous loco-manipulation and universal teleoperation interfaces, creating a richer bridge between human motion and robot behavior. HOVER then unifies multiple control modes inside a single generalist whole-body controller, and ASAP closes the dynamics gap through residual delta-action alignment, enabling agile behaviors learned in simulation to survive deployment on real hardware.
Part II, Sim-to-Real Perceptive Navigation and Mobility, brings exteroception into the control loop. ABS demonstrates that safe movement in unstructured environments requires perception-aware mobility rather than blind control.
Part III, Sim-to-Real Perceptive Loco-Manipulation, extends this principle from mobility to long-horizon interaction. VIRAL shows that large-scale visual sim-to-real training can produce RGB-based humanoid policies for navigation, grasping, transport, and placement using onboard sensing alone. DoorMan then demonstrates that the same philosophy extends to articulated interaction, where pixel-to-action policies must coordinate perception, balance, manipulation, and long-horizon contact-rich sequencing to open diverse real-world doors.
Taken together, these chapters define a coherent dissertation-level contribution: a scalable recipe for general-purpose humanoid skills in which human motion supplies priors, simulation supplies scale, control abstractions supply reuse, alignment mechanisms preserve transfer, and perception turns whole-body competence into embodied interaction. The thesis argues that the path toward useful humanoids is cumulative rather than monolithic. General-purpose capability becomes plausible when skill acquisition, policy design, and deployment are treated as one continuous sim-to-real systems problem.
Notes:
copied = false, 2000);
">
@phdthesis{He-2026-88261,
author = {Tairan He},
title = {Scalable Sim-to-Real Learning for General-Purpose Humanoid Skills},
year = {2026},
month = {April},
school = {Carnegie Mellon University},
address = {Pittsburgh, PA},
number = {CMU-RI-TR-26-23},
keywords = {humanoid, sim-to-real, reinforcement learning, robot learning, robotics, loco-manipulation},
}
author = {Tairan He},
title = {Scalable Sim-to-Real Learning for General-Purpose Humanoid Skills},
year = {2026},
month = {April},
school = {Carnegie Mellon University},
address = {Pittsburgh, PA},
number = {CMU-RI-TR-26-23},
keywords = {humanoid, sim-to-real, reinforcement learning, robot learning, robotics, loco-manipulation},
}