Contextual-based Image Inpainting: Infer, Match, and Translate
ECCV 2018
Yuhang Song1    Chao Yang1    Zhe Lin2    Xiaofeng Liu3    Qin Huang1    Hao Li1,4,5    C.-C. Jay Kuo1   
University of Southern California1    Adobe Research2    Carnegie Mellon University3    Pinscreen4    USC Institute for Creative Technologies5   

We study the task of image inpainting, which is to fill in the missing region of an incomplete image with plausible contents. To this end, we propose a learning-based approach to generate visually coherent completion given a high-resolution image with missing components. In order to overcome the difficulty to directly learn the distribution of highdimensional image data, we divide the task into inference and translation as two separate steps and model each step with a deep neural network. We also use simple heuristics to guide the propagation of local textures from the boundary to the hole. We show that, by using such techniques, inpainting reduces to the problem of learning two image-feature translation functions in much smaller space and hence easier to train. We evaluate our method on several public datasets and show that we generate results of better visual quality than previous state-of-the-art methods.

System Overview

Our system divides the image inpainting tasks into three steps:

Inference: We use an Image2Feature network to fill an incomplete image with coarse contents as inference and extract a feature map from the inpainted image.

Matching: We use patch-swap on the feature map to match the neural patches from the high-resolution boundary to the hole with coarse inference.

Translation: We use a Feature2Image network to translate the feature map to a complete image.


We propose a learning-based approach to synthesize missing contents in a highresolution image. Our model is able to inpaint an image with realistic and sharp contents in a feed-forward manner. We show that we can simplify training by breaking down the task into multiple stages, where the mapping function in each stage has smaller dimensionality. It is worth noting that our approach is a metaalgorithm and naturally we could explore a variety of network architectures and training techniques to improve the inference and the final result. We also expect that similar idea of multi-stage, multi-scale training could be used to directly synthesize high-resolution images from sampling.

Footer With Address And Phones