We introduce a new silhouette-based representation for modeling clothed human bodies using deep generative models. Our method can reconstruct a complete and textured 3D model of a person wearing clothes from a single input picture. Inspired by the visual hull algorithm, our implicit representation uses 2D silhouettes and 3D joints of a body pose to describe the immense shape complexity and variations of clothed people. Given a segmented 2D silhouette of a person and its inferred 3D joints from the input picture, we first synthesize consistent silhouettes from novel view points around the subject. The synthesized silhouettes, which are the most consistent with the input segmentation are fed into a deep visual hull algorithm for robust 3D shape prediction. We then infer the texture of the subject’s back view using the frontal image and segmentation mask as input to a conditional generative adversarial network. Our experiments demonstrate that our silhouette-based model is an effective representation and the appearance of the back view can be predicted reliably using an image-to-image translation network. While classic methods based on parametric models often fail for single-view images of subjects with challenging clothing, our approach can still produce successful results, which are comparable to those obtained from multi-view input.
To reduce the immense solution space of human body shapes, several 3D body model repositories, e.g. SCAPE and SMPL, have been introduced, which have made the single-view reconstruction of human bodies more tractable. In particular, a 3D parametric model is built from such database, which uses pose and shape parameters of the 3D body to best match an input image. As the mapping between the body geometry and the parameters of the deformable model is highly non-linear, alternative approaches based on deep learning have become increasingly popular. The seminal work of Dibra et al. introduces deep neural networks to estimate the shape parameters from a single input silhouette. More recent works predict body parameters of the popular SMPL model by either minimizing the silhouette matching error, joint error based on the silhouette and 2D joints, or an adversarial loss that can distinguish unrealistic reconstruction output. Concurrent to our work, Weng et al. present a method to animate a person in 3D from a single image based on the SMPL model and 2D warping.
Although our silhouette synthesis algorithm generates
sharp prediction of novel-view silhouettes, the estimated
results may not be perfectly consistent as the conditioned
3D joints may fail to fully disambiguate the details in
the corresponding silhouettes (e.g., fingers, wrinkles of
garments). Therefore, naively applying conventional visual
hull algorithms is prone to excessive erosion in the reconstruction,
since the visual hull is designed to subtract the
inconsistent silhouettes in each view. To address this issue,
we propose a deep visual hull network that reconstructs
a plausible 3D shape of clothed body without requiring
perfectly view-consistent silhouettes by leveraging the shape
prior of clothed human bodies.
In particular, we use a network structure based on Deep volumetric video from
very sparse multi-view performance capture.
At a high level, Huang et al. propose to map 2D
images to a 3D volumetric field through a multi-view
convolutional neural network. The 3D field encodes the
probabilistic distribution of 3D points on the captured
surface. By querying the resulting field, one can instantiate
the geometry of clothed human body at an arbitrary
resolution. However, unlike their approach which takes
carefully calibrated color images from fixed views as input,
our network only consumes the probability maps of novelview
silhouettes, which can be inconsistent across different
views. Although arbitrary number of novel-view silhouettes
can be generated, it remains challenging to properly select
optimal input views to maximize the network performance.
Therefore, we introduce several improvements to increase
the reconstruction accuracy.
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Vestibulum convallis erat ex, non consequat magna rutrum eget. Integer rhoncus dolor elit, eget molestie leo porttitor vel. In finibus urna eros, id tristique turpis elementum at. Suspendisse urna metus, finibus ultrices cursus quis, vehicula non elit. Fusce nec velit nec tellus dictum lobortis. Proin dictum pharetra porttitor. Duis est erat, ornare non tempus sed, commodo ut lorem. Ut a scelerisque lorem, a sagittis eros. Praesent aliquet neque at orci aliquet, eu pellentesque libero finibus. Ut facilisis, lectus quis ultricies fringilla, felis tortor volutpat odio, vel vulputate libero elit nec quam. Nulla a rhoncus arcu, in ullamcorper enim. Pellentesque ultrices condimentum odio, eu suscipit lorem mollis et. Sed quis turpis at tortor imperdiet vestibulum. Aliquam venenatis nisl in venenatis fermentum.