Abstract

Face age editing has become a crucial task in film post- production, and is also becoming popular for general purpose photog- raphy. Recently, adversarial training has produced some of the most visually impressive results for image manipulation, including the face aging/de-aging task. In spite of considerable progress, current methods often present visual artifacts and can only deal with low-resolution im- ages. In order to achieve aging/de-aging with the high quality and ro- bustness necessary for wider use, these problems need to be addressed. This is the goal of the present work. We present an encoder-decoder ar- chitecture for face age editing. The core idea of our network is to create both a latent space containing the face identity, and a feature modula- tion layer corresponding to the age of the individual. We then combine these two elements to produce an output image of the person with a desired target age. Our architecture is greatly simplified with respect to other approaches, and allows for continuous age editing on high res- olution images in a single unified model. Source codes are available at https://github.com/InterDigitalInc/HRFAE.

1. Introduction

Learning to manipulate face age is an important topic both in industry and academia. In the movie post-production industry, many actors are retouched in some way, either for beautification or texture editing. More specifically, synthetic aging or de-aging effects are usually generated by makeup or special visual ef- fects. Although impressive results can be obtained digitally, as in the recent Mar- tin Scorcese’s movie The Irishman, the underlying processes are extremely time consuming. Thus, robust, high-quality algorithms for performing automatic age modification are highly desirable. Nevertheless, editing faces is an intrinsically difficult task. Indeed, the human brain is particularly good at perceiving faces’ attributes in order to detect, recognize or analyze them, for instance to infer identity or emotions. Consequently, even small artifacts are immediately per- ceived and ruin the perception of results. For this reason, our goal is to produce artifact-free, sharp and photorealistic results on high-resolution face images.

With the success of Generative Adversarial Networks (GANs) [7] in high quality image generation, GAN-based models have been widely used for image- to-image translation [35,40]. Despite having set new standards for natural image synthesis, GANs are known to suffer from two major flaws : an abundance of small artifacts and strong instability of the training process. The latest face aging studies [9,20,33,36,39] also adopt GAN-based models. Specifically, they divide face datasets into different age groups, feed young images into the generator, and rely on the discriminator to map output images to older age distributions. There are multiple limitations to this approach. Firstly, as can be expected, these approaches inherit the drawbacks of GAN-based methods - blurry background, small parasite structures, instability of training. Secondly, as the aging effect is generated by matching the output image distribution to the target group, these methods are limited to coarse aging/de-aging. To achieve fine-grained transfor- mation, a separate model needs to be trained between each pair of ages.