Toward Realistic Visual Content Creation: Generative AI for Human-Centric and Product-Centric Scene Synthesis

May 2025

Toward Realistic Visual Content Creation: Generative AI for Human-Centric and Product-Centric Scene Synthesis

Authors:

Jialu Gao

Abstract:

The synthesis of realistic and context-aware visual content is a core challenge in the application of generative AI to both creative media and e-commerce. This thesis explores two distinct but complementary directions in AI-driven scene generation: human-centric insertion and product-centric advertisement creation.

In the first part, we present Teleportraits, a training-free pipeline for realistic human insertion into diverse background scenes using pre-trained text-to-image diffusion models. By leveraging inversion techniques and classifier-free guidance, our method jointly addresses the problems of human placement and high-fidelity personalization without requiring additional training. A novel mask-guided self-attention mechanism further enhances identity preservation, capturing fine details such as clothing and body features from a single reference image. Our approach sets a new state-of-the-art in seamless, high-quality human integration within composite scenes.

In the second part, we introduce a scalable solution for automated lifestyle advertisement generation: Multi-Object Advertisement Creative Gener- ation. Recognizing the limitations of current GenAI tools in generating realistic, brand-aligned ad content at scale, we design a modular system that independently addresses product pairing, layout composition, and background synthesis. The system includes a user-friendly interface sup- porting global batch generation and local control, enabling advertisers to efficiently produce high-quality, contextually rich images across extensive product catalogs. Comprehensive evaluations and user studies demon- strate the effectiveness of our pipeline in bridging creativity and scalability for real-world e-commerce applications.

Together, these works highlight the transformative potential of generative models in automating complex visual synthesis tasks while retaining personalization, realism, and user control.

Notes:

@mastersthesis{Gao-2025-146393,
author = {Jialu Gao},
title = {Toward Realistic Visual Content Creation: Generative AI for Human-Centric and Product-Centric Scene Synthesis},
year = {2025},
month = {May},
school = {Carnegie Mellon University},
address = {Pittsburgh, PA},
number = {CMU-RI-TR-25-32},
keywords = {Generative AI, Image Synthesis, Diffusion Models},
}
Copyright notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.