Capturing Dynamic Textured Surfaces of Moving Targets

Ruizhe Wang ¹ Lingyu Wei ¹ Etienne Vouga ² Qixing Huang ^{2 3} Duygu Ceylan ⁴ Gérard Medioni ¹ Hao Li ¹

University of Southern California ¹ University of Texas at Austin ² Toyota Technological Institute at Chicago ³ Adobe Research ⁴

Figure 1: Example capturing results. The sequence in the lower right corner is reconstructed from Structure IO sensors, while other sequences are reconstructed from Kinect One Sensors.

Abstract

We present an end-to-end system for reconstructing complete watertight and textured models of moving subjects such as clothed humans and animals, using only three or four handheld sensors. The heart of our framework is a new pairwise registration algorithm that minimizes, using a particle swarm strategy, an alignment error metric based on mutual visibility and occlusion. We show that this algorithm reliably registers partial scans with as little as 15% overlap without requiring any initial correspondences, and outperforms alternative global registration algorithms. This registration algorithm allows us to reconstruct moving subjects from free-viewpoint video produced by consumer-grade sensors, without extensive sensor calibration, constrained capture volume, expensive arrays of cameras, or templates of the subject geometry.

Introduction

The rekindling of interest in immersive, 360-degree virtual environments, spurred on by the Oculus, Hololens, and other breakthroughs in consumer AR and VR hardware, has birthed a need for digitizing objects with full geometry and texture from all views. One of the most important objects to digitize in this way are moving, clothed humans, yet they are also among the most challenging: the human body can undergo large deformations over short time spans, has complex geometry with occluded regions that can only be seen from a small number of angles, and has regions like the face with important high-frequency features that must be faithfully preserved.

Most techniques for capturing high-quality digital humans rely on a large array of sensors mounted around a fixed capture volume. The recent work of Collet et al. uses such a setup to capture live performances and compresses them to enable streaming of free-viewpoint videos. Unfortunately, these techniques are severely restrictive: first, to ensure high-quality reconstruction and sufficient coverage, a large number of expensive sensors must be used, leaving human capture out of reach of consumers without the resources of a professional studio. Second, the subject must remain within the small working volume enclosed by the sensors, ruling out subjects interacting with large, open environments or undergoing large motions.

Robust Rigid Registration

The key technical challenge in our pipeline is registering a set of depth images accurately without assuming any initialization, even when the geometry visible in each depth image has very little overlap with any other depth image.We attack this problem by developing a robust pairwise global registration method: let P1 and P2 be partial meshes generated from two depth images captured simultaneously. We seek a global Euclidean transformation T12 which aligns P2 to P1. Traditional pairwise registration based on finding corresponding points on P1 and P2, and minimizing the distance between them, has notorious difficulty in this setting. As such we propose a novel visibility error metric (VEM), and we minimize the VEM to find T12. We further extend this pairwise method to handle multi-view global registration.

Figure 2: Translation estimation examples of our Hough Transform method on range scans with limited overlap. The na¨ıve method, which simply aligns the corresponding centroids, fails to estimate the correct translation.

Finding the Transformation

Minimizing the error metric (1) consists of solving a nonlinear least squares problem and so in principle can be optimized using e.g. the Gauss-Newton method. However, it is non-convex, and prone to local minima. Absent a straightforward heuristic for picking a good initial guess, we instead adopt a Particle Swarm Optimization (PSO) [21] method to efficiently minimize (1), where “particles” are candidate rigid transformations that move towards smaller energy landscapes in SE(3). We could independently minimize E starting from each particle as an initial guess, but this strategy is not computationally tractable. So we iteratively update all particle positions in lockstep: a small set of the most promising guide particles, that are most likely to be close to the global minimum, are updated using an iteration of Levenberg-Marquardt. The rest of the particles receive PSO-style weighted random perturbations.

Figure 3: Example registration results of range images with limited overlap. First and second row show examples from the Stanford 3D Scanning Repository and the Princeton Shape Benchmark respectively. Please see the supplementary material for more examples.

Conclusion

We have demonstrated that it is possible, using only a small number of synchronized consumer-grade handheld sensors, to reconstruct fully-textured moving humans, and without restricting the subject to the constrained environment required by stage setups with calibrated sensor arrays. Our system does not require a template geometry in advance and thus can generalize well to a variety of subjects including animals and small children. Since our system is based on low-cost devices and works in fully unconstrained environments, we believe our system is an important step toward accessible creation of VR and AR content for consumers. Our results depend critically on our new alignment algorithm based on the visibility error metric, which can reliably align partial scans with much less overlap than is required by current state-of-the-art registration algorithms. Without this alignment algorithm, we would need to use many more sensors, and solve the sensor interference problem that would arise.We believe this algorithm is an important contribution on its own, as a significant step forward in global registration.

Downloads

Paper
Capturing Dynamic Textured Surfaces of Moving Targets.pdf, (3.13MB)

Video
Download, (34.8MB)

Capturing Dynamic Textured Surfaces of Moving Targets

Ruizhe Wang 1 Lingyu Wei 1 Etienne Vouga 2 Qixing Huang 2 3 Duygu Ceylan 4 Gérard Medioni 1 Hao Li 1

Ruizhe Wang ¹ Lingyu Wei ¹ Etienne Vouga ² Qixing Huang ^{2 3} Duygu Ceylan ⁴ Gérard Medioni ¹ Hao Li ¹