June 2026

Red and Blue Teaming for Robust Manipulation Under Geometric Robustness

Authors:

Master's Thesis

Abstract:

Robotic manipulation policies are typically evaluated on curated, in-distribution test sets, which offer limited insight into how these policies behave under plausible variation. One important source of this variation is geometric in nature, arising from small changes in object geometry that quietly alter grasp affordances and contact dynamics. Rather than treating robustness as a property to be measured passively, this thesis develops methods that use object geometry as a controllable axis along which a policy can be actively stress-tested and subsequently improved.

We first introduce Geometric Red-Teaming (GRT), a framework that automatically discovers CrashShapes: structurally valid, user-constrained mesh deformations that induce catastrophic failures in pre-trained manipulation policies. GRT couples a Jacobian- field deformation model with a gradient-free, simulator-in-the-loop optimization strategy, allowing it to search over plausible object geometries while treating the policy being tested as a black box. Across insertion, articulation, and grasping tasks, GRT consistently finds deformations that collapse policy performance, exposing brittle failure modes that static benchmarks miss. We further show that fine-tuning on a small set of CrashShapes together with the nominal object, a process we call blue-teaming, improves task success by up to 60 percentage points on those shapes while preserving performance on the original object. We validate both red-teaming and blue-teaming on a real robot arm, where simulated CrashShapes reduce task success from 90% to as low as 22.5%, and blue-teaming recovers performance to as high as 90% on the corresponding real-world geometry.

While GRT provides a direct evaluation tool for geometric robustness, its simulator- in-the-loop search takes hours per object and therefore cannot supply CrashShapes for large-scale blue-teaming. To address this limitation, we introduce CrashGen, a point-cloud-conditioned diffusion model that amortizes this search by generating failure- inducing deformation controls in seconds rather than hours. A deterministic Jacobian-field operator then maps these controls to physically plausible deformations of the original mesh. Trained once on controls discovered by running GRT offline, CrashGen generates failure-inducing variants for over 80% of held-out objects under simulation validation. Such variants can in turn serve as targeted training data: blue-teaming Contact-GraspNet on a mixture of original meshes and CrashGen variants improves grasp success on held- out objects by up to 21% in simulation and 30% on a real robot, relative to a baseline fine-tuned only on the original meshes.

Together, GRT and CrashGen show that object geometry can serve two complementary roles in making manipulation policies more reliable under geometric variation. GRT provides an active evaluation tool for identifying policy-critical geometric perturbations, while CrashGen provides a constructive mechanism for turning those perturbations into scalable training data. Once an expensive diagnostic, geometric failure discovery becomes a scalable source of targeted supervision for the very policies it stress-tests.

Notes:

@mastersthesis{Goel-2026-88305,
author = {Divyam Goel},
title = {Red and Blue Teaming for Robust Manipulation Under Geometric Robustness},
year = {2026},
month = {June},
school = {Carnegie Mellon University},
address = {Pittsburgh, PA},
number = {CMU-RI-TR-26-58},
}