Rendering is the process of generating 2D images from 3D assets, simulated in a virtual environment, typically with a graphics pipeline. By inverting such renderer, one can think of a learning approach to predict a 3D shape from an input image. However, standard rendering pipelines involve a fundamental discretization step called rasterization, which prevents the rendering process to be differentiable, hence able to be learned. We present the first nonparametric and truly differentiable rasterizer based on silhouettes. Our method enables unsupervised learning for high-quality 3D mesh reconstruction from a single image. We call our framework “soft rasterizer” as it provides an accurate soft approximation of the standard rasterizer. The key idea is to fuse the probabilistic contributions of all mesh triangles with respect to the rendered pixels. When combined with a mesh generator in a deep neural network, our soft rasterizer is able to generate an approximated silhouette of the generated polygon mesh in the forward pass. The rendering loss is back-propagated to supervise the mesh generation without the need of 3D training data. Experimental results demonstrate that our approach significantly outperforms the state-of-the-art unsupervised techniques, both quantitatively and qualitatively. We also show that our soft rasterizer can achieve comparable results to the cutting-edge supervised learning method  and in various cases even better ones, especially for real-world data.
The main obstacle that impedes the standard graphics renderer from being differentiable is the discrete sampling operation, which is also named rasterization, that converts a continuous vector graphics into a raster image. In particular, after projecting the mesh triangles onto the screen space, standard rasterization technique fills each pixel with the color from the nearest triangle which covers that pixel. However, the color intensity of an image is the result of complex interplay between a variety of factors, including the lighting condition, viewing direction, reflectance property and the intrinsic texture of the rendered object, most of which are entirely independent from the 3D shape of the target object. Though one can infer fine surface details from the shading cues, special care has to be taken to decompose shading from the reflectance layers. Therefore, leveraging color information for 3D geometry reconstruction may unnecessarily complicate the problem especially when the target object only consists of smooth surfaces.
In this paper, we have presented the first non-parametric differentiable rasterizer (SoftRas) that enables unsupervised learning for high-quality mesh reconstruction from a single image. We demonstrate that it is possible to properly approximate the forward pass of the discrete rasterization with a differentiable framework. While many previous works like N3MR seek to provide approximated gradient in the backward propagation but using standard rasterizer in the forward pass, we believe that the consistency between the forward and backward propagations is the key to achieve superior performance. In addition, we found that proper choice of regularizers plays an important role for producing visually appealing geometry. Experiments have shown that our unsupervised approach achieves comparable and in certain cases, even better results to state-of-the-art supervised solutions.