Abstract:
Synthesizing photorealistic human faces from novel viewpoints using only a single frontal image remains a challenging problem in computer vision. Large viewpoint changes introduce geometric distortions, self-occlusions, and missing visual information, making identity preservation and high-frequency detail reconstruction particularly difficult. While recent generative approaches such as diffusion models and 3D-aware neural representations produce visually compelling results, they typically require expensive training and slow inference. In contrast, lightweight feed-forward models enable efficient inference but often fail to capture fine-grained details and complex appearance variations.
This thesis presents a geometry-guided framework that integrates explicit 3D structure with learned image refinement to achieve both efficiency and realism. The method first builds a geometrically consistent prior by fitting a 3D Morphable Model, estimating a texture map from the frontal image, augmenting it with a lightweight hair representation, and rendering the target viewpoint. A convolutional residual network then refines this prior by predicting residual corrections that restore fine details and enhance local realism while preserving geometric consistency. Adversarial supervision further improves perceptual quality, encouraging sharper textures and more natural appearance without increasing inference cost.
We compare the proposed approach against Cap4D, a state-of-the-art method, in a single-image side-view synthesis setting. The results demonstrate substantially improved computational efficiency—achieving inference within seconds on a single GPU—while maintaining stronger identity preservation. These findings show that geometry-guided residual refinement offers a practical and scalable alternative to heavy 3D-aware generative pipelines for identity-consistent novel view synthesis.
This thesis presents a geometry-guided framework that integrates explicit 3D structure with learned image refinement to achieve both efficiency and realism. The method first builds a geometrically consistent prior by fitting a 3D Morphable Model, estimating a texture map from the frontal image, augmenting it with a lightweight hair representation, and rendering the target viewpoint. A convolutional residual network then refines this prior by predicting residual corrections that restore fine details and enhance local realism while preserving geometric consistency. Adversarial supervision further improves perceptual quality, encouraging sharper textures and more natural appearance without increasing inference cost.
We compare the proposed approach against Cap4D, a state-of-the-art method, in a single-image side-view synthesis setting. The results demonstrate substantially improved computational efficiency—achieving inference within seconds on a single GPU—while maintaining stronger identity preservation. These findings show that geometry-guided residual refinement offers a practical and scalable alternative to heavy 3D-aware generative pipelines for identity-consistent novel view synthesis.
Notes:
copied = false, 2000);
">
@mastersthesis{Fernandez Garcia-2026-88260,
author = {Alvaro Fernandez Garcia},
title = {Regression-based Multi-view Face Synthesis},
year = {2026},
month = {April},
school = {Carnegie Mellon University},
address = {Pittsburgh, PA},
number = {CMU-RI-TR-26-20},
}
author = {Alvaro Fernandez Garcia},
title = {Regression-based Multi-view Face Synthesis},
year = {2026},
month = {April},
school = {Carnegie Mellon University},
address = {Pittsburgh, PA},
number = {CMU-RI-TR-26-20},
}