Mesoscopic Facial Geometry Inference Using Deep Neural Networks
CVPR 2018
Loc Huynh1    Weikai Chen1    Shunsuke Saito1,2,4    Jun Xing1    Koki Nagano1,4   
Andrew Jones1    Paul Debevec1,3    Hao Li1,2,4   
USC Institute for Creative Technologies1    University of Southern California2    Google3    Pinscreen4   

We present a learning-based approach for synthesizing facial geometry at medium and fine scales from diffusely-lit facial texture maps. When applied to an image sequence, the synthesized detail is temporally coherent. Unlike current state-of-the-art methods which assume ”dark is deep”, our model is trained with measured facial detail collected using polarized gradient illumination in a Light Stage. This enables us to produce plausible facial detail across the entire face, including where previous approaches may incorrectly interpret dark features as concavities such as at moles, hair stubble, and occluded pores. Instead of directly inferring 3D geometry, we propose to encode fine details in high-resolution displacement maps which are learned through a hybrid network adopting the state-of-the-art image-to-image translation network and super resolution network. To effectively capture geometric detail at both mid- and high frequencies, we factorize the learning into two separate sub-networks, enabling the full range of facial detail to be modeled. Results from our learning-based approach compare favorably with a high-quality active facial scanhening technique, and require only a single passive lighting condition without a complex scanning setup.

Experimental Results

We evaluate the effectiveness of our approach on different input textures with a variety of subjects and expressions. We show the synthesized geometries embossed by only medium-scale details, 1K and 4K combined multi-scale (both medium and high frequency) displacement maps, with the the input textures and base mesh shown in the first and second column, respectively. As seen from the results, our method can faithfully capture both the medium and fine scale geometries. The final geometry synthesized using the 4K displacement map exhibits mesoscale eometry on par with active facial scanning. None of these subjects are used in training the network, and show the the robustness of our method to a variety of texture qualities, expressions, gender, and ages. We validate the effectiveness of geometry detail separation by comparing with an alternative solution which does not decouple middle and high frequencies. The displacement map learned from the alternative method fails to capture almost all the high frequency details while introducing artifacts in middle frequencies, which is manifested in the embossed geometry. Our method, on the other hand, faithfully replicates both medium and fine scale details in the resulting displacement map.

We also assess the effectiveness of the proposed superresolution network in our framework. The reconstructed result using supersolution network outperforms its opponent significantly in faithfully replicating mesocopic facial structures.