August 2022

An efficient approach for sequential human performance capture from monocular video

Authors:

CMU-RI-TR-22-58

Abstract:

Human performance capture from RGB videos in unconstrained environments has become very popular for applications to generate virtual avatars or digital actors. Modern approaches rely on neural network algorithms to estimate geometry directly from images, resulting in a coarse representation of the shape of the person. On the other hand, optimization-based approaches that use shape-from-silhouette provide a more accurate reconstruction but they are computationally expensive and require a good initialization. In this work, we propose a learning-based approach for optimizing fine geometry information (e.g., clothes, wrinkles) from monocular RGB cameras. In particular, we sequentially recover different shape details (e.g., average shape without cloths, clothing, wrinkles) using separate neural networks. At each level, our network takes the sparse gradient of body mesh vertices generated from 2D off-the-shelf silhouette/normal supervisions and predicts dense gradients to update the body shape. Our networks are able to converge within a few interactions and achieve pixel-level accuracy. In addition, our method shares the benefit of classical optimization methods under challenging poses and novel views. As demonstrated by the experimental validations, our strategy is both effective and efficient across a wide range of datasets.

Notes:

@mastersthesis{Chen-2022-133235,
author = {Jianchun Chen},
title = {An efficient approach for sequential human performance capture from monocular video},
year = {2022},
month = {August},
school = {Carnegie Mellon University},
address = {Pittsburgh, PA},
number = {CMU-RI-TR-22-58},
}