Near-Instant Capture of High-Resolution Facial Geometry and Reflectance

G. Fyffe1    P. Graham1    B. Tunwattanapong1    A. Ghosh 2    P. Debevec1   
USC Institute for Creative Technologies, USA 1     Imperial College London, UK 2    


Figure 1: (a) Multi-view images shot under rapidly varying flash directions. (b) Refined geometry. (c) Clockwise from top left: diffuse albedo, specular albedo, specular exponent x 0.02, surface normal map. (d) Rendering. (e) Zoom of cheek rendering.

 


Abstract:

We present a near-instant method for acquiring facial geometry and reflectance using a set of commodity DSLR cameras and flashes. Our setup consists of twenty-four cameras and six flashes which are fired in rapid succession with subsets of the cameras. Each camera records only a single photograph and the total capture time is less than the 67ms blink reflex. The cameras and flashes are specially arranged to produce an even distribution of specular highlights on the face. We employ this set of acquired images to estimate diffuse color, specular intensity, specular exponent, and surface orientation at each point on the face. We further refine the facial base geometry obtained from multi-view stereo using estimated diffuse and specular photometric information. This allows final submillimeter surface mesostructure detail to be obtained via shape-from-specularity. The final system uses commodity components and produces models suitable for authoring high-quality digital human characters.



Introduction:

Modeling realistic human characters is frequently done using 3D recordings of the shape and appearance of real people across a set of facial expressions to build blendshape facial models. To cross the “Uncanny Valley”, faces require high-quality geometry, texture maps, reflectance properties, and surface detail at the level of skin pores and fine wrinkles. Unfortunately, there has not yet been a technique for recording such datasets that is nearinstantaneous and relatively low-cost. While some facial capture techniques are instantaneous and inexpensive, these do not generally provide lighting-independent texture maps, specular reflectance information, or high-resolution surface normal detail for relighting. In contrast, techniques using multiple photographs and spherical lighting setups do capture such reflectance properties, but this comes at the expense of longer capture times and complicated custom equipment.



Results:

We employed our system to acquire a variety of subjects in differing facial expressions. Figs. 1*, 7*, and 10* show high-resolution geometry and renderings under novel viewpoint and lighting using our method, with complete results including all recovered reflectance maps shown in Fig. 7*. Our acquisition system produces geometric quality competitive with more complex systems and reflectance maps not available from single-shot methods. Running on a dual quad-core 2.4 GHz Intel Xeon E5620 CPU with hyperthreading and an NVidia GTX Titan graphics card, the initial passive stereo geometry solve typically takes about 25 minutes, estimating the reflectance maps takes 15 minutes, and refining the geometry takes an additional 20 minutes. Fig. 11(a-c) show geometry reconstruction comparing our method (a) to the method of (b), and also to passive stereo reconstruction (c) using the “dark is deep” heuristic to emboss surface detail. It is worth noting that the polarized spherical gradient illumination result employs 7 high end DSLR cameras (Canon EOS 1D X). In comparison, though our acquisition method employs more cameras (24), we employ relatively inexpensive entry level DSLR cameras (Canon EOS 600D) resulting in a lower overall cost. The fine scale surface mesostructure is faithfully reconstructed using our method (obtained from specular reflections) without requiring a complex LED sphere setup as in. Note that the surface details in (c) are predominantly concave, whereas (a) and (b) exhibit a mix of convex and concave features, and are largely in agreement.



Mesh Refinement:

We refine the facial geometry mesh using the method of Nehab et al. [NRDR05]. We first resample the base mesh to produce a fine mesh using a regular 4096 x 4096 sampling in UV space. We then employ the low-frequency rotation field idea from [NRDR05] to remove any low-frequency disagrement between the photometric normals and the base mesh. One issue is that, due to occlusions, our photometric normal estimates may use a different set of views for different points on the surface and hence may contain seams. To alleviate the seam problem, we modify the method such that only points having the same set of visible views are blended together when computing the rotation field. We finally employ the full model optimization method of [NRDR05] to produce the final high resolution facial mesh. We then repeat the entire method once more, using the refined mesh as the base mesh for the second iteration. This reduces artifacts stemming from the coarse facets of the original base mesh.

 



Figure 2: Diffuse-Specular separation. (a-c) Three of the 24 original photographs. (d-f) Estimated specular components.

Figure 3: (a) Ground-truth photograph with frontal illumination. (b-d) Synthetic renderings with the same view and illumiation using (b) our proposed method, (c) polarized spherical gradient illumination [GFT11], and (d) passive stereo reconstruction using the flat-lit photographs from (c) and a “dark is deep” detail emboss. Some of the reflectance maps in (c) and (d) are not estimated automatically and are tuned manually (see text for details). (e-h) Zooms of cheek region in (a-d). Note that the fine-scale details in the specular highlights on the skin in (f) and (g) are generally in agreement with ground truth, while the highlight details in (h) differ significantly.



Conclusion:

We have presented a near-instant capture technique for recording the geometry and reflectance of a face from a set of still photographs lit by flash illumination. The technique leverages photoconsistency, photometric stereo, and specular reflections simultaneously to solve for facial shape and reflectance that explain the input photographs. It is the first near-instant capture technique able to produce such data at high resolution and at substantially lower cost than more complex reflectance measurement setups.




Material:

PAPER:


Related Projects: