HybridFusion: Real-Time Performance Capture Using a Single Depth Sensor and Sparse IMUs

ECCV 2018

Zerong Zheng¹ Tao Yu^1,2 Hao Li^3,4 Kaiwen Guo⁵ Qionghai Dai¹ Lu Fang⁶ Yebin Liu¹

Tsinghua University, Beijing, China¹ Beihang University, Beijing, China² University of Southern California³ USC Institute for Creative Technologies⁴ Google Inc., Mountain View, CA⁵ Tsinghua-Berkeley Shenzhen Institute, Tsinghua University⁶

Abstract

We propose a light-weight yet highly robust method for realtime human performance capture based on a single depth camera and sparse inertial measurement units (IMUs). Our method combines nonrigid surface tracking and volumetric fusion to simultaneously reconstruct challenging motions, detailed geometries and the inner human body of a clothed subject. The proposed hybrid motion tracking algorithm and efficient per-frame sensor calibration technique enable nonrigid surface reconstruction for fast motions and challenging poses with severe occlusions. Significant fusion artifacts are reduced using a new confidence measurement for our adaptive TSDF-based fusion. The above contributions are mutually beneficial in our reconstruction system, which enable practical human performance capture that is real-time, robust, low-cost and easy to deploy. Experiments show that extremely challenging performances and loop closure problems can be handled successfully.

Fig. 1: The state-of-the-art methods easily get failed under severe occlusions. (a,d): color references captured from Kinect (up) and a 3rd person view (down). (b,e) and (c,f): results of DoubleFusion and our method rendered in the 3rd person view.

Introduction

The 3D acquisition of human performances has been a challenging topic for decades due to the shape and deformation complexity of dynamic surfaces, especially for clothed subjects. To ensure high-fidelity digitalization, sophisticated multi-camera array systems [8, 4, 5, 44, 17, 24, 7, 14, 30] are preferred for professional productions. TotalCapture [13], the state-of-the-art human performance capture system, uses more than 500 cameras to minimize occlusions during human-object interactions. Not only are these systems difficult to deploy and costly, they also come with a significant amount of synchronization, calibration, and data processing effort.

On the other end of the spectrum, the recent trend of using a single depth camera for dynamic scene reconstruction [25, 12, 10, 32] provides a very convenient and real-time approach for performance capture combined with online nonrigid volumetric depth fusion. However, such monocular systems are limited to slow and controlled motions. While improvement has been demonstrated lately in systems like BodyFusion [45], DoubleFusion [46] and SobolevFusion [33], it is still impossible to reconstruct occluded limb motions (Fig.1(b)) and ensure loop closure during online reconstruction. For practical deployment, such as gaming, where fast motion is expected and possibly interactions between multiple users, it is necessary to ensure continuously reliable performance capture.

To read the publication, please click on link below.

PDF Link

Hybrid Fusion ReelTime Performance Capture Using A Single Depth Sensor And Sparse Imus (8.31MB)

HybridFusion: Real-Time Performance Capture Using a Single Depth Sensor and Sparse IMUs

Zerong Zheng1 Tao Yu1,2 Hao Li3,4 Kaiwen Guo5 Qionghai Dai1 Lu Fang6 Yebin Liu1

Zerong Zheng¹ Tao Yu^1,2 Hao Li^3,4 Kaiwen Guo⁵ Qionghai Dai¹ Lu Fang⁶ Yebin Liu¹