May
2025
Toward Realistic Visual Content Creation: Generative AI for Human-Centric and Product-Centric Scene Synthesis
Authors:
Abstract:
The synthesis of realistic and context-aware visual content is a core challenge in the application of generative AI to both creative media and e-commerce. This thesis explores two distinct but complementary directions in AI-driven scene generation: human-centric insertion and product-centric advertisement creation.
In the first part, we present Teleportraits, a training-free pipeline for realistic human insertion into diverse background scenes using pre-trained text-to-image diffusion models. By leveraging inversion techniques and classifier-free guidance, our method jointly addresses the problems of human placement and high-fidelity personalization without requiring additional training. A novel mask-guided self-attention mechanism further enhances identity preservation, capturing fine details such as clothing and body features from a single reference image. Our approach sets a new state-of-the-art in seamless, high-quality human integration within composite scenes.
In the second part, we introduce a scalable solution for automated lifestyle advertisement generation: Multi-Object Advertisement Creative Gener- ation. Recognizing the limitations of current GenAI tools in generating realistic, brand-aligned ad content at scale, we design a modular system that independently addresses product pairing, layout composition, and background synthesis. The system includes a user-friendly interface sup- porting global batch generation and local control, enabling advertisers to efficiently produce high-quality, contextually rich images across extensive product catalogs. Comprehensive evaluations and user studies demon- strate the effectiveness of our pipeline in bridging creativity and scalability for real-world e-commerce applications.
Together, these works highlight the transformative potential of generative models in automating complex visual synthesis tasks while retaining personalization, realism, and user control.
In the first part, we present Teleportraits, a training-free pipeline for realistic human insertion into diverse background scenes using pre-trained text-to-image diffusion models. By leveraging inversion techniques and classifier-free guidance, our method jointly addresses the problems of human placement and high-fidelity personalization without requiring additional training. A novel mask-guided self-attention mechanism further enhances identity preservation, capturing fine details such as clothing and body features from a single reference image. Our approach sets a new state-of-the-art in seamless, high-quality human integration within composite scenes.
In the second part, we introduce a scalable solution for automated lifestyle advertisement generation: Multi-Object Advertisement Creative Gener- ation. Recognizing the limitations of current GenAI tools in generating realistic, brand-aligned ad content at scale, we design a modular system that independently addresses product pairing, layout composition, and background synthesis. The system includes a user-friendly interface sup- porting global batch generation and local control, enabling advertisers to efficiently produce high-quality, contextually rich images across extensive product catalogs. Comprehensive evaluations and user studies demon- strate the effectiveness of our pipeline in bridging creativity and scalability for real-world e-commerce applications.
Together, these works highlight the transformative potential of generative models in automating complex visual synthesis tasks while retaining personalization, realism, and user control.
Notes:
copied = false, 2000);
">
@mastersthesis{Gao-2025-146393,
author = {Jialu Gao},
title = {Toward Realistic Visual Content Creation: Generative AI for Human-Centric and Product-Centric Scene Synthesis},
year = {2025},
month = {May},
school = {Carnegie Mellon University},
address = {Pittsburgh, PA},
number = {CMU-RI-TR-25-32},
keywords = {Generative AI, Image Synthesis, Diffusion Models},
}
author = {Jialu Gao},
title = {Toward Realistic Visual Content Creation: Generative AI for Human-Centric and Product-Centric Scene Synthesis},
year = {2025},
month = {May},
school = {Carnegie Mellon University},
address = {Pittsburgh, PA},
number = {CMU-RI-TR-25-32},
keywords = {Generative AI, Image Synthesis, Diffusion Models},
}