High-fidelity facial and speech animation for VR HMDs (bibtex)

by Olszewski, Kyle, Lim, Joseph J., Saito, Shunsuke and Li, Hao

Abstract:

Several significant challenges currently prohibit expressive interaction in virtual reality (VR). The occlusion introduced by modern head-mounted displays (HMDs) makes most existing techniques for facial tracking intractable in this scenario. Furthermore, even state-of-the-art techniques used for real-time facial tracking in less constrained environments fail to capture subtle details of the user’s facial expressions that are essential for compelling speech animation. We introduce a novel system for HMD users to control a digital avatar in real-time while producing plausible speech animation and emotional expressions. Using a monocular camera attached to the front of an HMD, we record video sequences from multiple subjects performing a variety of facial expressions and speaking several phonetically-balanced sentences. These images are used with artist-generated animation data corresponding to these sequences to train a convolutional neural network (CNN) to regress images of a user’s mouth region to the parameters that control a digital avatar. To make training this system more tractable, we make use of audiobased alignment techniques to map images of multiple users making the same utterance to the corresponding animation parameters. We demonstrate that our regression technique is also feasible for tracking the expressions around the user’s eye region, including the eyebrows, with an infrared (IR) camera within the HMD, thereby enabling full facial tracking. This system requires no user-specific calibration, makes use of easily obtainable consumer hardware, and produces high-quality animations of both speech and emotional expressions. Finally, we demonstrate the quality of our system on a variety of subjects and evaluate its performance against state-of-the-art realtime facial tracking techniques.

View PDF

Reference:

High-fidelity facial and speech animation for VR HMDs (Olszewski, Kyle, Lim, Joseph J., Saito, Shunsuke and Li, Hao), In ACM Transactions on Graphics, volume 35, 2016.

Bibtex Entry:

@article{olszewski_high-fidelity_2016,
	title = {High-fidelity facial and speech animation for {VR} {HMDs}},
	volume = {35},
	issn = {07300301},
	url = {http://dl.acm.org/citation.cfm?doid=2980179.2980252},
	doi = {10.1145/2980179.2980252},
	abstract = {Several significant challenges currently prohibit expressive interaction in virtual reality (VR). The occlusion introduced by modern head-mounted displays (HMDs) makes most existing techniques for facial tracking intractable in this scenario. Furthermore, even state-of-the-art techniques used for real-time facial tracking in less constrained environments fail to capture subtle details of the user’s facial expressions that are essential for compelling speech animation. We introduce a novel system for HMD users to control a digital avatar in real-time while producing plausible speech animation and emotional expressions. Using a monocular camera attached to the front of an HMD, we record video sequences from multiple subjects performing a variety of facial expressions and speaking several phonetically-balanced sentences. These images are used with artist-generated animation data corresponding to these sequences to train a convolutional neural network (CNN) to regress images of a user’s mouth region to the parameters that control a digital avatar. To make training this system more tractable, we make use of audiobased alignment techniques to map images of multiple users making the same utterance to the corresponding animation parameters. We demonstrate that our regression technique is also feasible for tracking the expressions around the user’s eye region, including the eyebrows, with an infrared (IR) camera within the HMD, thereby enabling full facial tracking. This system requires no user-specific calibration, makes use of easily obtainable consumer hardware, and produces high-quality animations of both speech and emotional expressions. Finally, we demonstrate the quality of our system on a variety of subjects and evaluate its performance against state-of-the-art realtime facial tracking techniques.},
	number = {6},
	journal = {ACM Transactions on Graphics},
	author = {Olszewski, Kyle and Lim, Joseph J. and Saito, Shunsuke and Li, Hao},
	month = nov,
	year = {2016},
	keywords = {Graphics},
	pages = {1--14}
}