We propose a new head-mounted camera system based on stereo cellphone cameras. These cameras have the advantage of being extremely small, light-weight, and programmable. We provide step-by-step instructions on how to recreate this apparatus and give an example on how to use this platform for facial performance capture, in particular for tracking of facial features.
(1) Camera: The LG Thrill P925 smart phone features a 3D stereo camera module that provides two synchronized stereo cameras in a tiny 4.2 gram module. We use two phones for a total of four cameras. However we do not want to mount two entire phones at the end of the helmet arm. Instead we designed a custom umbilical cord that allows the camera module to function at a large distance from the phone itself.
(2) HeadCam: In comparison to our previous headcam (on the left), which had two Point Grey Flea cameras, our new cellphone-based model is much lighter, has four cameras instead of two, and much easier wiring. The cell phone also opens up the possibility of previewing video over the network, and onboard image processing.
(3) Custom Android Application: With a "point and shoot" philosophy, cell phones automate exposure, focus, color balance, and stereo convergence. We developed a custom camera application, that uses the LG Real3D SDK to lock the convergence, and the Android SDK to set focus and color balance.
(4) Captured data: We are capturing four views of the face from two stereo cell phones. This provides good coverage of the face from multiple angles. The data is recorded to each cameras internal memory, but also could be streamed over the wireless network.
(5) Calibration: We developed a new single-shot calibration process, using a 6" cylinder covered with a 2cm grid of black and white squares. The cylinder's checkerboard corners can be detected quickly and automatically. Unlike techniques that rely on planar or spherical calibration objects, a cylinder provides points at multiple depths, and more closely approximates the shape of a human face.
(6) Tracking: Here we track 3D facial features using a 3D Constrained Local Model (CLMZ) [1]. We use camera parameters acquired during the calibration step to initialize head pose estimation. We are continuing to develop the CLMZ algorithm to incorporate multi-view constraints in order to improve the quality of feature tracking. It should also be possible to reduce computational requirements to allow processing directly on the mobile phones.
[1] Baltrusaitis, T., Robinson, P. and Morency, L.-P. 2012. 3D Constrained Local Model for Rigid and Non-Rigid Facial Tracking. Computer Vision and Pattern Recognition (CVPR 2012)