In 2008, the "Digital Emily" project showed how a series of high-resolution facial expressions
scanned in a light stage could be rigged into a real-time photoreal digital character and driven
with video-based facial animation techniques. However, Emily was rendered offline, was just the
front of the face, and was never seen in a tight closeup.
In this collaboration between Activision and USC ICT, we tried to create a real-time, photoreal
digital human character which could be seen from any viewpoint, any lighting, and could perform
realistically from video performance capture even in a tight closeup.
In addition, we needed this to run in a game-ready production pipeline. To achieve this, we
scanned the actor in thirty high-resolution expressions using the USC ICT's new Light Stage X
system [Ghosh et al. SIGGRAPHAsia2011] and chose eight expressions for the real-time performance
rendering. To record the performance, we shot multi-view 30fps video of the actor performing
improvised lines using the same multi-camera rig. We used a new tool called Vuvuzela to
interactively and precisely correspond all expression (u,v)'s to the neutral expression,
which was retopologized to an artist mesh. Our new offline animation solver works by
creating a performance graph representing dense GPU optical flow between the video frames
and the eight expressions. This graph gets pruned by analyzing the correlation between the
video frames and the expression scans over twelve facial regions. The algorithm then computes
dense optical flow and 3D triangulation yielding per-frame spatially varying blendshape weights
approximating the performance.
To create the game-ready facial rig, we transferred the mesh animation to standard bone animation
on a 4K polygon mesh using a bone weight and transform solver. The solver optimizes the smooth
skinning weights and the bone animated transforms to maximize the correspondence between the
game mesh and the reference animated mesh. The rendering technique uses surface stress values
to blend diffuse texture, specular, normal, and displacement maps from the high-resolution scans
per-vertex at run time. The DirectX11 rendering includes screen-space subsurface scattering,
translucency, eye refraction and caustics, real-time ambient shadows, a physically-based two-lobe
specular reflection with microstructure, depth of field, antialiasing, and film grain. This is a
continuing project and some ongoing work includes simulating eyelid bulge, displacement shading,
ambient transmittance and several other dynamic effects.