High-Resolution Image Inpainting using Multi-Scale Neural Patch Synthesis
20XX PAPER
Chao Yang 1    Xin Lu 2    Zhe Lin 2    Eli Shechtman 2    Oliver Wang 2    Hao Li 1 3 4   
University of Southern California 1    Adobe Research 2    Pinscreen 3    USC Institute for Creative Technologies 4   

Figure 1: Qualitative illustration of the task. Given an image (512 x 512) with a missing hole (256 x 256) (a), our algorithm can synthesize sharper and more coherent hole content (d) comparing with Context Encoders (b) and PatchMatch (c).
Abstract

Recent advances in deep learning have shown exciting promise in filling large holes in natural images with semantically plausible and context aware details, impacting fundamental image manipulation tasks such as object removal. While these learning-based methods are significantly more effective in capturing high-level features than prior techniques, they can only handle very lowresolution inputs due to memory limitations and difficulty in training. Even for slightly larger images, the inpainted regions would appear blurry and unpleasant boundaries become visible. We propose a multi-scale neural patch synthesis approach based on joint optimization of image content and texture constraints, which not only preserves contextual structures but also produces high-frequency details by matching and adapting patches with the most similar mid-layer feature correlations of a deep classification network. We evaluate our method on the ImageNet and Paris Streetview datasets and achieved state-of-theart inpainting accuracy. We show our approach produces sharper and more coherent results than prior methods, especially for high-resolution images.

Introduction

Before sharing a photo, users may want to make modifications such as erasing distracting scene elements, adjusting object positions in an image for better composition, or recovering image content in occluded image areas. These, and many other editing operations, require automated hole-filling (image completion), which has been an active research topic in the computer vision and graphics communities for the past few decades. Due to its inherent ambiguity and the complexity of natural images, general hole-filling remains challenging.


Figure 2: Figure 6. Evaluation of different components. (a) input image. (b) result without using content constraint. (c) our result.
The Role of the Content Network in Joint Optimization

On the one hand, we utilize the output of the content network as an initialization in the joint optimization. We have demonstrate its performance in the last section. In this section, we demonstrate the role of the content network serving as the content constraints in joint optimization by dropping the content constraint term and only using the texture contraint term in the joint optimization. We compare inpainting results with and without the content contraints. As shown in Fig. 2, without using the content term guiding the optimization, the structure of the inpainting results is corrupted. This explains the usefulness of the content term in the proposed joint optimization with Titan X.


Figure 3: Arbitrary object removal. From left to right: input image, object mask, PatchMatch result, our result.
Conclusion

We advanced the state of the art in semantic inpainting using neural patch synthesis. The insight is that the texture network is very powerful in generating highfrequency details while the content network gives strong prior about the semantics and global structure. There are cases when our approach introduces discontinuity and artifacts when the scene is complicated. In addition, the speed remains a bottleneck of our algorithm. We aim to address these issues in future work.

Footer With Address And Phones