by Saito, Shunsuke, Wei, Lingyu, Hu, Liwen, Nagano, Koki and Li, Hao
Abstract:
We present a data-driven inference method that can synthesize a photorealistic texture map of a complete 3D face model given a partial 2D view of a person in the wild. After an initial estimation of shape and low-frequency albedo, we compute a high-frequency partial texture map, without the shading component, of the visible face area. To extract the fine appearance details from this incomplete input, we introduce a multi-scale detail analysis technique based on midlayer feature correlations extracted from a deep convolutional neural network. We demonstrate that fitting a convex combination of feature correlations from a high-resolution face database can yield a semantically plausible facial detail description of the entire face. A complete and photorealistic texture map can then be synthesized by iteratively optimizing for the reconstructed feature correlations. Using these high-resolution textures and a commercial rendering framework, we can produce high-fidelity 3D renderings that are visually comparable to those obtained with state-of-theart multi-view face capture systems. We demonstrate successful face reconstructions from a wide range of low resolution input images, including those of historical figures. In addition to extensive evaluations, we validate the realism of our results using a crowdsourced user study.
Reference:
Photorealistic Facial Texture Inference Using Deep Neural Networks (Saito, Shunsuke, Wei, Lingyu, Hu, Liwen, Nagano, Koki and Li, Hao), In arXiv preprint arXiv:1612.00523, 2016.
Bibtex Entry:
@article{saito_photorealistic_2016,
title = {Photorealistic {Facial} {Texture} {Inference} {Using} {Deep} {Neural} {Networks}},
url = {https://arxiv.org/abs/1612.00523},
abstract = {We present a data-driven inference method that can synthesize a photorealistic texture map of a complete 3D face model given a partial 2D view of a person in the wild. After an initial estimation of shape and low-frequency albedo, we compute a high-frequency partial texture map, without the shading component, of the visible face area. To extract the fine appearance details from this incomplete input, we introduce a multi-scale detail analysis technique based on midlayer feature correlations extracted from a deep convolutional neural network. We demonstrate that fitting a convex combination of feature correlations from a high-resolution face database can yield a semantically plausible facial detail description of the entire face. A complete and photorealistic texture map can then be synthesized by iteratively optimizing for the reconstructed feature correlations. Using these high-resolution textures and a commercial rendering framework, we can produce high-fidelity 3D renderings that are visually comparable to those obtained with state-of-theart multi-view face capture systems. We demonstrate successful face reconstructions from a wide range of low resolution input images, including those of historical figures. In addition to extensive evaluations, we validate the realism of our results using a crowdsourced user study.},
journal = {arXiv preprint arXiv:1612.00523},
author = {Saito, Shunsuke and Wei, Lingyu and Hu, Liwen and Nagano, Koki and Li, Hao},
month = dec,
year = {2016},
keywords = {Graphics, UARC}
}