We present an adversarial network for rendering photorealistic hair as an alternative to conventional computer graphics pipelines. Our deep learning approach does not require low-level parameter tuning nor ad-hoc asset design. Our method simply takes a strand-based 3D hair model as input and provides intuitive user-control for color and lighting through reference images. To handle the diversity of hairstyles and its appearance complexity, we disentangle hair structure, color, and illumination properties using a sequential GAN architecture and a semisupervised training approach. We also introduce an intermediate edge activation map to orientation field conversion step to ensure a successful CG-to-photoreal transition, while preserving the hair structures of the original input data. As we only require a feed-forward pass through the network, our rendering performs in real-time. We demonstrate the synthesis of photorealistic hair images on a wide range of intricate hairstyles and compare our technique with state-of-the-art hair rendering methods.
Since we are only focusing on hair synthesis, we mask non-hair regions to ignore their effect, and set their pixel values to black. To avoid manual mask annota- tion, we train Pyramid Scene Parsing Network to perform automatic hair segmentation. We annotate hair masks for 3000 random images from CelebA-HQ dataset Progressive growing of GANs for improved quality, stability, and variation., and train our network on this data. We use the network to compute masks for the entire 30,000 images in CelebA-HQ dataset, and manually remove images with wrong segmentation, yielding about 27,000 segmented hair images.z We randomly sampled 5,000 images from this dataset to use as our training data. We apply same deterministic filters Fi on each image in the training data, to obtain the corresponding gray image, orientation maps, and edge activation maps.
We presented the first deep learning approach for rendering photorealistic hair, which performs in real-time. We have shown that our sequential GAN architec- ture and semi-supervised training approach can effectively disentangle strand- level structures, appearance, and illumination properties from the highly com- plex and diverse range of hairstyles. In particular, our evaluations show that without our sequential architecture, the lighting parameter would dominate over color, and color specification would no longer be effective. Moreover, our trained latent space is smooth, which allows us to interpolate continuously between arbitrary color and lighting samples. Our evaluations also suggests that there are no significant differences between a vanilla conditional GAN and a state-of-the-art network such as bicycleGAN, which uses additional smoothness constraints in the training. Our experiments further indicate that a direct conversion from a CG rendering to a photoreal image using existing adversarial networks would lead to significant artifacts or unwanted hairstyles. Our intermediate conversion step from edge activation to orientation map has proven to be an effective way for semi-supervised training and transitioning from synthetic input to photoreal output while ensuring that the intended hairstyle structure is preserved.
As shown in the video demo, the hair rendering
is not entirely temporally coherent when rotating the view. While the per
frame predictions are reasonable and most strand structure are consistent be-
tween frames, there are still visible ickering artifacts. We believe that temporal
consistency could be trained with augmentations with 3D rotations, or video
training data.
We believe that our sequential GAN architecture for parameter separation
and our intermediate representation for CG-to-photoreal conversion could be
generalized for the rendering of other objects and scenes beyond hair. Our
method presents an interesting alternative and complementary solution for many
applications, such as hair modeling with interactive visual feedback, photo
manipulation, and image-based 3D avatar rendering.
While we do not provide the same level of fine-grained control as conventional
graphics pipelines, our effcient approach is significantly simpler and generates
more realistic output without any tedious fine tuning. Nevertheless, we would
like to explore the ability to specify precise lighting configurations and advanced
shading parameters for a seamless integration of our hair rendering into vir-
tual environments and game engines. We believe that additional training with
controlled simulations and captured hair data would be necessary.
Like other GAN techniques, our results are also not fully indistinguishable
from real ones for a trained eye and an extended period of observation, but we are
confident that our proposed approach would benefit from future advancements in GANs.