Intuitive, Interactive Beard and Hair Synthesis with Generative Models
CVPR 2020
Kyle Olszewski1,3    Duygu Ceylan2    Jun Xing3,5    Jose Echevarria2   
Zhili Chen2,7    Weikai Chen3,6    Hao Li1,3,4   
University of Southern California1    Adobe Inc.2    USC Institute for Creative Technologies3   
Pinscreen4    miHoYo5    Tencent America6    ByteDance Research7   

We present an interactive approach to synthesizing realistic variations in facial hair in images, ranging from subtle edits to existing hair to the addition of complex and challenging hair in images of clean-shaven subjects. To circumvent the tedious and computationally expensive tasks of modeling, rendering and compositing the 3D geometry of the target hairstyle using the traditional graphics pipeline, we employ a neural network pipeline that synthesizes realistic and detailed images of facial hair directly in the target image in under one second. The synthesis is controlled by simple and sparse guide strokes from the user defining the general structural and color properties of the target hairstyle. We qualitatively and quantitatively evaluate our chosen method compared to several alternative approaches. We show compelling interactive editing results with a prototype user interface that allows novice users to progressively refine the generated image to match their desired hairstyle, and demonstrate that our approach also allows for flexible and high-fidelity scalp hair synthesis.

Given a target subject image, a masked region in which to perform synthesis, and a set of strokes of varying colors provided by the user, our approach interactively synthesizes hair with the appropriate structure and appearance.
Hair Modeling

Hair is a crucial component for photorealistic avatars and CG characters. In professional production, human hair is modeled and rendered with sophisticated devices and tools [11, 39, 69, 73]. We refer to [66] for an extensive survey of hair modeling techniques. In recent years, several multi-view [51, 29] and single-view [9, 8, 30, 7] hair modeling methods have been proposed. An automatic pipeline for creating a full head avatar from a single portrait image has also been proposed [31]. Despite the large body of work in hair modeling, however, techniques applicable to facial hair reconstruction remain largely unexplored. In [3], a coupled 3D reconstruction method is proposed to recover both the geometry of sparse facial hair and its underlying skin surface. More recently, Hairbrush [71] demonstrates an immersive data-driven modeling system for 3D strip-based hair and beard models.

We propose a two-stage network architecture to synthesize realistic facial hair. Given an input image with a userprovided region of interest and sparse guide strokes defining the local color and structure of the desired hairstyle, the first stage synthesizes the hair in this region. The second stage refines and composites the synthesized hair into the input image.
Refinement and Compositing

Once the initial facial hair region is synthesized, we perform refinement and compositing into the input image. This is achieved by a second encoder-decoder network. The input to this network is the output of the initial synthesis stage, the corresponding segmentation map, and the segmented target image (the target image with the region to be synthesized covered by the segmentation mask). The output is the image with the synthesized facial hair refined and composited into it.

The architecture of the second generator and discriminator networks are identical to the first network, with only the input channel sizes adjusted accordingly. While we use the adversarial and perceptual losses in the same manner as the previous stage, we define the L1 loss on the entire synthesized image. However, we increase the weight of this loss by a factor of 0.5 in the segmented region containing the facial hair. The boundary between the synthesized facial hair region and the rest of the image is particularly important for plausible compositions. Using erosion/dilation operations on the segmented region (with a kernel size of 10 for each operation), we compute a mask covering this boundary. We further increase the weight of the loss for these boundary region pixels by a factor of 0.5. More details on the training process can be found in the appendix.

Example results. We show several example applications, including creating and editing facial hair on subjects with no facial hair, as well as making modifications to the overall style and color of facial hair on bearded individuals.
Limitations and Future Work

While we demonstrate impressive results, our approach has several limitations. As with other data-driven algorithms, our approach is limited by the amount of variation found in the training dataset. Close-up images of highresolution complex structures fail to capture all the complexity of the hair structure, limiting the plausibility of the synthesized images. As our training datasets mostly consist of images of natural hair colors, using input with very unusual hair colors also causes noticeable artifacts. See Fig. 10 for examples of these limitations.

We demonstrate that our approach, though designed to address challenges specific to facial hair, synthesizes compelling results when applied to scalp hair given appropriate training data. It would be interesting to explore how well this approach extends to other related domains such as animal fur, or even radically different domains such as editing and synthesizing images or videos containing fluids or other materials for which vector fields might serve as an appropriate abstract representation of the desired image content.

Footer With Address And Phones