A New Dimension in Testimony: Relighting Video with Reflectance Field Exemplars

Loc Huynh Bipin Kishore Paul Debevec

USC Institute for Creative Technologies

Abstract

We present a learning-based method for estimating 4D reflectance field of a person given video footage illuminated under a flat-lit environment of the same subject. For training data, we use one light at a time to illuminate the subject and capture the reflectance field data in a variety of poses and viewpoints. We estimate the lighting environment of the input video footage and use the subject’s reflectance field to create synthetic images of the subject illuminated by the input lighting environment. We then train a deep convolutional neural network to regress the reflectance field from the synthetic images. We also use a differentiable renderer to provide feedback for the network by matching the relit images with the input video frames. This semi-supervised training scheme allows the neural network to handle unseen poses in the dataset as well as compensate for the lighting estimation error. We evaluate our method on the video footage of the real Holocaust survivors and show that our method outperforms the state-of-the-art methods in both realism and speed.

Given an input video of an actor lit by a single diffuse lighting condition (a), our method is able to relight the dynamic performance of the subject under any lighting condition (b).

Introduction

The New Dimensions in Testimony project at the University of Southern California’s Institute for Creative Technologies recorded extensive question-and-answer interviews with twelve survivors of the World War II Holocaust. Each twenty-hour interview, conducted over five days, produced over a thousand responses, providing the material for time-offset conversations through AI based matching of novel questions to recorded answers. These interviews were recorded inside a large Light Stage system with fifty-four high-definition video cameras. The multiview data enabled the conversations to be projected threedimensionally on an automultiscopic display.

The light stage system is designed for recording relightable reflectance fields, where the subject is illuminated from one lighting direction at a time, and these datasets can be recombined through image-based relighting. If the subject is recorded with a high speed video camera, a large number of lighting conditions can be recorded during a normal video frame duration allowing a dynamic video to be lit with new lighting. This enables the subject to be realistically composited into a new environment (for example, the place that the subject is speaking about) such that their lighting is consistent with that of the environment. In 2012, the project performed a successful early experiment using a Spherical Harmonic Lighting Basis as in for relighting a Holocaust survivor interview. However, recording with an array of high speed cameras proved to be too expensive for the project, both in the cost of the hardware, and the greatly increased storage cost of numerous high-speed uncompressed video streams.

The architecture of our neural network. The input image is passed through a U-Net style architecture to regress to the set of OLAT images. When the ground truth is available, the network prioritizes the reconstruction loss of the OLAT imageset. Otherwise, the network is trained based on the feedback of the relit image.

Method

One of the most effective ways to perform realistic relighting is to combine a dense set of basis lighting conditions (a reflectance field) with according to a novel lighting environment to simulate the appearance in the new lighting. However, this approach is not ideal for a dynamic performance since it requires either high-speed cameras, or requires the actor to sit still for several seconds to capture the set of OLAT images. [25] overcomes this limitation by using neural networks to regress 4D reflectance fields from just two images of a subject lit by gradient illumination. They postulate that one can also use flat-lit images to achieve similar results with less high-frequency detail. Since the method casts relighting as a supervision regression problem, it requires pairs of tracking images and their corresponding OLAT images as ground truth for training.

In the New Dimensions in Testimony project, most of the Holocaust survivors’ interview footage was captured in front of a green screen so that the virtual backgrounds can be added during post-production. However, this setup poses difficulties for achieving consistent illumination between the actors and the backgrounds in the final testimony videos and does not provide the ground truth needed for supervision training. In this paper, we use the limited OLAT data to train a neural network to infer reflectance fields from synthetically relit images. The synthetic relit images are improved by matching them with the input interview images through a differentiable renderer, enabling an end-toend training scheme.

Relighting results - Row 1: Input interview videos. Row 2,3: OLAT predictions on two patterns. Row 4,5: Relighting results with two HDRI lighting environments: Grace Cathedral and Pisa Courtyard. See more examples in our video.

Downloads

Relighting Video with Reflectance Field Exemplars.pdf (7.9MB)