May
2026
Synthetic Data for Object Detection: Improving Robustness and Revealing Vulnerabilities
Authors:
Abstract:
Synthetic data has emerged as a promising solution to the growing challenges of data acquisition in object detection. Modern detectors rely heavily on large-scale annotated datasets, yet collecting real-world data with high-quality labels is often costly, labor-intensive, and impractical in diverse or rare scenarios. By enabling controllable generation with automatic annotations, synthetic data provides a scalable alternative. In this thesis, we investigate its two complementary roles: improving robustness under domain shifts and revealing fundamental vulnerabilities of object detectors.
First, we propose a synthetic data generation framework based on diffusion models to bridge distribution gaps between source and target domains in aerial imagery. By synthesizing high-quality images and corresponding annotations through cross-attention-guided labeling and multi-stage knowledge transfer, our approach significantly improves detection robustness in unseen environments, outperforming supervised learning on source domain data, weakly supervised and unsupervised domain adaptation methods, open-set object detectors, and vision large language models.
Second, we explore the adversarial potential of synthetic data via a controllable imageediting framework for realistic camouflage attacks. By formulating camouflaged adversarial example generation as a conditional image-editing problem, we design image-level and scene-level strategies that produce stealthy, physically plausible camouflages while effectively degrading detector performance. Extensive experiments demonstrate strong attack effectiveness, improved human-perceived stealthiness, and transferability to black-box models, and to the physical world.
Together, these complementary perspectives highlight the dual utility of synthetic data for object detection: as a powerful tool for improving robustness under domain shifts and as a principled lens for uncovering model vulnerabilities.
First, we propose a synthetic data generation framework based on diffusion models to bridge distribution gaps between source and target domains in aerial imagery. By synthesizing high-quality images and corresponding annotations through cross-attention-guided labeling and multi-stage knowledge transfer, our approach significantly improves detection robustness in unseen environments, outperforming supervised learning on source domain data, weakly supervised and unsupervised domain adaptation methods, open-set object detectors, and vision large language models.
Second, we explore the adversarial potential of synthetic data via a controllable imageediting framework for realistic camouflage attacks. By formulating camouflaged adversarial example generation as a conditional image-editing problem, we design image-level and scene-level strategies that produce stealthy, physically plausible camouflages while effectively degrading detector performance. Extensive experiments demonstrate strong attack effectiveness, improved human-perceived stealthiness, and transferability to black-box models, and to the physical world.
Together, these complementary perspectives highlight the dual utility of synthetic data for object detection: as a powerful tool for improving robustness under domain shifts and as a principled lens for uncovering model vulnerabilities.
Notes:
copied = false, 2000);
">
@mastersthesis{Fang-2026-88275,
author = {Xiao Fang},
title = {Synthetic Data for Object Detection: Improving Robustness and Revealing Vulnerabilities},
year = {2026},
month = {May},
school = {Carnegie Mellon University},
address = {Pittsburgh, PA},
number = {CMU-RI-TR-26-37},
keywords = {Synthetic data, Object detection},
}
author = {Xiao Fang},
title = {Synthetic Data for Object Detection: Improving Robustness and Revealing Vulnerabilities},
year = {2026},
month = {May},
school = {Carnegie Mellon University},
address = {Pittsburgh, PA},
number = {CMU-RI-TR-26-37},
keywords = {Synthetic data, Object detection},
}