Creating a life-sized automultiscopic Morgan Spurlock for CNNs "Inside Man"
SIGGRAPH 2014 Talk
SIGGRAPH 2015 E-Tech
Andrew Jones    Jonas Unger*    Koki Nagano    Jay Busch    Xueming Yu   
Hsuan-Yueh Peng    Oleg Alexander    Paul Debevec   
USC Institute for Creative Technologies    Linköping University*
Introduction

We present a system for capturing and rendering life-size 3D human subjects on an automultiscopic display. Automultiscopic 3D displays allow a large number of viewers to experience 3D content simultaneously without the hassle of special glasses or head gear. Such displays are ideal for human subjects as they allow for natural personal interactions with 3D cues such as eye-gaze and complex hand gestures. In this talk, we will focus on a case-study where our system was used to digitize television host Morgan Spurlock for his documentary show "Inside Man" on CNN. Automultiscopic displays work by generating many simultaneous views with highangular density over a wide-field of view. The angular spacing between between views must be small enough that each eye perceives a distinct and different view. As the user moves around the display, the eye smoothly transitions from one view to the next. We generate multiple views using a dense horizontal array of video projectors. As video projectors continue to shrink in size, power consumption, and cost, it is now possible to closely stack hundreds of projectors so that their lenses are almost continuous. However this display presents a new challenge for content acquisition. It would require hundreds of cameras to directly measure every projector ray. We achieve similar quality with a new view interpolation algorithm suitable for dense automultiscopic displays.

Our interpolation algorithm builds on Einarsson et al. [2006] who used optical flow to resample a sparse light field. While Einarsson et al. was limited to cyclical motions using a rotating turntable, we use an array of 30 unsynchronized Panasonic X900MK 60p consumer cameras spaced over 180 degrees to capture unconstrained motion. We first synchronize our videos within 1/120 of a second by aligning their corresponding sound waveforms. We compute pair-wise spatial flow correspondences between cameras using GPU optical flow. As each camera pair is processed independently, the pipeline can be highly parallelized. As a result, we achieve much shorter processing times than traditional multi-camera stereo reconstructions. Our view interpolation algorithm maps images directly from the original video sequences to all the projectors in realtime, and could easily scale to handle additional cameras or projectors. For the "Inside Man" documentary we recorded a 54 minute interview with Morgan Spurlock, and processed 7 minutes of 3D video for the final show.

Our projector array consists of 216 video projectors mounted in a semi-circle with a 3.4m radius. We have a narrow 0:625 º spacing between projectors which provides a large display depth of field with minimal aliasing. We use LED-powered Qumi v3 projectors in a portrait orientation (Fig. 2). At this distance the projected pixels fill a 2m tall anisotropic screen with a life-size human body (Fig. 1). The screen material consists of a vertically-anisotropic light shaping diffuser manufactured by Luiminit Co. The material scatters light vertically (60 º) so that each pixel can be seen at multiple viewing heights and while maintaining a narrow horizontal blur (1 º) to smoothly fill in the gaps between the projectors with adjacent pixels. More details on the screen material can be found in Jones et al. [2014]. We use six computers to render the projector images. Each computer contains two ATI Eyefinity 7800 graphics cards with 12 total video outputs. Each video signal is then divided three ways using a Matrox TripleHead-to-Go video HDMI splitter.

In the future, we plan on capturing longer format interviews and other dynamic performances. We are working to incorporate natural language processing to allow for true interactive conversations with realistic 3D humans.

Frequently Asked Questions

How do you create the 3D Morgan Spurlock?
We conducted an hour long interview in Light Stage 6 The stage provided flat even illumination from all directions with over 6,000 white LEDs. Fifty high-definition Panasonic video cameras simultaneously recorded Morgan's performance. Each camera recorded a different perspective of Morgan over 180 degrees. The cameras were closely spaced so that we could later generate any intermediate view, which would ultimately enable us to project Morgan's digital self from all directions on the 3D display.

Is this a hologram?
In popular culture, holograms tend to be any display that shows a floating 3D image without the need for 3D glasses. True holography, however, refers to a specific process: recording a 3D scene with coherent laser light and recording the full waveform reflected off a scene on a photographic plate. Our display is not a true hologram because it only recreates varying rays of light. However, it is a true 3D display. In technical terms, what we have is an "automultiscopic multiview display", aptly named because it exhibits multiple 3D images in different directions without the need for stereo glasses.

How does Morgan appear 3D?
The 3D Morgan Spurlock is displayed on a dense array of 216 Qumi video projectors mounted in a semicircle behind a flat anisotropic screen. Each projector generates a slightly different rendering of Morgan Spurlock based on the recorded video. As the viewer moves around the screen, she or he sees a varying 3D image composed of pixels from the subset of projectors behind the screen. The perspective for each position matches what the view would be if Morgan was actually standing in front of the viewer in the shared space.

How do you control so many projectors at once?
The rendering process is distributed across multiple computers with multiple graphics cards. Matrox video splitters allow each computer to send signals to 36 projectors.

What material is the screen made out of?
The screen is a vertical anisotropic light-shaping diffuser. Unlike a conventional Lambertian screen, which scatters light in all directions, our screen scatters light from each projector as a narrow vertical stripe while preserving the varying angular distribution of rays from projectors. Each point on the screen can display different colors in different horizontal directions corresponding to the different projectors behind the screen.

Have you built any other projector arrays?
We've built a smaller projector array designed to show just a human head using 72 Texas Instruments pico projectors to illuminate a 30cm screen. We demonstrated this system at SIGGRAPH 2013 Emerging Technologies and presented a paper at SPIE Stereoscopic Displays and Applications XXV. Visit An Autostereoscopic Projector Array Optimized for 3D Facial Display for more information.

What is horizontal parallax?
Horizontal parallax is the perception of 3D depth resulting from the horizontal motion of the head. As the head moves back and forth, the user sees a new view of the 3D scene in front of her or him. This provides a strong perception of depth beyond the binocular depth cue from the viewer's two eyes.

What does autostereoscopic mean? What does automultiscopic mean?
Autostereoscopic means that a display can generate multiple views simultaneously so a viewer can see a 3D image without the need for stereo 3D glasses. Automultiscopic is a related word, which means that multiple viewers can see the 3D image without the need for stereo 3D glasses.

Is this system the same as the CNN's Hologram shown by Wolf Blitzer during the 2008 presidential campaign?
No. Our system produces a real 3D image you can actually see, whereas CNN's "hologram" was purely a visual effect. To the home viewer, CNN's anchor Wolf Blitzer appeared to be looking at three-dimensional images of guests Jessica Yellin and Will.i.am, but this was a video overlay effect done for the home audience with no actual 3D display technology involved. In CNN's studio, Blitzer was actually staring across empty space toward a standard 2D flat panel television. The CNN system used an array of multiple cameras to film the guests from multiple viewpoints to create a novel viewpoint that matched the studio cameras. For more information on the 2008 CNN capture process, visit (link).

How is this display different from the Tupac performance at Coachella in 2012?
The Tupac performance did not use a 3D display, but a clever optical illusion known as "Pepper's ghost." Pepper's ghost dates back to the 16th century and uses a semi-reflective screen to make an image appear to float in front of a background screen. The floating scene is simply a reflection of a hidden room or display that is out of the audience's view. At Coachella, a semi-transparent screen was used to reflect a 2D projected image so that it could be seen by the audience. The graphics themselves were created by the special effects company Digital Domain. For more information on the creation of the Tupac performance, visit (link).

How did the digital Morgan Spurlock know how to respond to each question?
We conducted an hour long interview of Morgan Spurlock answering a range of questions about technology, the future, and his family. As thorough as we were, there was no way to capture every possible question that could be potentially asked of Morgan. Morgan's conversation with his digital self was then scripted and tightly edited to create a natural conversation. If the real Morgan had asked a question outside the small recorded domain, we would have played a clip of digital Morgan saying, "I don't know how to answer that question".

We are currently undergoing a more extensive interview process, with subject interviews spanning up to 20 hours. We aim to delve deeper in capturing a subject's life story and experiences with the eventual ability for the subject's digital version to respond to a much larger scope of questions. The USC ICT Natural Language Group is further developing technology that automatically parses spoken questions a user asks the display and searches for the best possible recorded answer within the video database. More information on the Natural Language Group can be found here.

What other applications are there for this display?
While the display has many applications, from video games to medical visualization, we are currently working on a much larger project to record the 3D testimonies of Holocaust survivors. This project, "New Dimensions in Testimony" or NDT, is a collaboration between the USC Shoah Foundation and the USC Institute for Creative Technologies, in partnership with exhibit design firm Conscience Display. NDT combines ICT's Light Stage technology with natural language processing to allow users to engage with the digital testimonies conversationally. NDT's goal is to develop interactive 3D exhibits in which learners can have simulated educational conversations with survivors though the fourth dimension of time. Years from now, long after the last survivor has passed on, the New Dimensions in Testimony project can provide a path to enable youth to listen to a survivor and ask their own questions directly, encouraging them, each in their own way, to reflect on the deep and meaningful consequences of the Holocaust. NDT follows the age-old tradition of passing down lessons through oral storytelling, but with the latest technologies available.

USC SHOAH
USC Institute for Creative Technologies
USC ICT Natural Language Group
Footer With Address And Phones