High Resolution Face Age Editing

Abstract

Face age editing has become a crucial task in film post- production, and is also becoming popular for general purpose photog- raphy. Recently, adversarial training has produced some of the most visually impressive results for image manipulation, including the face aging/de-aging task. In spite of considerable progress, current methods often present visual artifacts and can only deal with low-resolution im- ages. In order to achieve aging/de-aging with the high quality and ro- bustness necessary for wider use, these problems need to be addressed. This is the goal of the present work. We present an encoder-decoder ar- chitecture for face age editing. The core idea of our network is to create both a latent space containing the face identity, and a feature modula- tion layer corresponding to the age of the individual. We then combine these two elements to produce an output image of the person with a desired target age. Our architecture is greatly simplified with respect to other approaches, and allows for continuous age editing on high res- olution images in a single unified model. Source codes are available at https://github.com/InterDigitalInc/HRFAE.

Face age editing은 영화제작이나 일반적인 사진을 위한 목적으로 대중화되고 있다
GAN으로 성과들이 나오고 있으나 현재까지의 sol 들은 낮은 해상도만을 제공한다. 높은 해상도의 face age editing 을 위해서는 이전 방법론들에 대한 검토가 필요하다.
이 논문의 주안점 2가지는
1. Face identity를 담고 있는 latent space 생성
2. Feature modultion layer 생성 (개개인의 나이고려)
de-aging VFX Flux sw (https://www.awn.com/vfxworld/irishman-solving-age-old-problem)

1. Introduction

Learning to manipulate face age is an important topic both in industry and academia. In the movie post-production industry, many actors are retouched in some way, either for beautification or texture editing. More specifically, synthetic aging or de-aging effects are usually generated by makeup or special visual ef- fects. Although impressive results can be obtained digitally, as in the recent Mar- tin Scorcese’s movie The Irishman, the underlying processes are extremely time consuming. Thus, robust, high-quality algorithms for performing automatic age modification are highly desirable. Nevertheless, editing faces is an intrinsically difficult task. Indeed, the human brain is particularly good at perceiving faces’ attributes in order to detect, recognize or analyze them, for instance to infer identity or emotions. Consequently, even small artifacts are immediately per- ceived and ruin the perception of results. For this reason, our goal is to produce artifact-free, sharp and photorealistic results on high-resolution face images.

최근 영화 Irishman 에서의 디지털로 처리된 face eating 기술은 결과적으로 놀랍지만 그 뒷단의 작업은 많은 시간을 소요한다
자동화된 고품질의 face age editing 기술 필요성이 대두
face editing 은 어려운 기술이고, 사람은 작은 결점도 잘 알아챈다.
이 논문의 목적은 artifact free, sharp, photorealistic 한 결과를 만들어 내는 것이다

With the success of Generative Adversarial Networks (GANs) [7] in high quality image generation, GAN-based models have been widely used for image- to-image translation [35,40]. Despite having set new standards for natural image synthesis, GANs are known to suffer from two major flaws : an abundance of small artifacts and strong instability of the training process. The latest face aging studies [9,20,33,36,39] also adopt GAN-based models. Specifically, they divide face datasets into different age groups, feed young images into the generator, and rely on the discriminator to map output images to older age distributions. There are multiple limitations to this approach. Firstly, as can be expected, these approaches inherit the drawbacks of GAN-based methods - blurry background, small parasite structures, instability of training. Secondly, as the aging effect is generated by matching the output image distribution to the target group, these methods are limited to coarse aging/de-aging. To achieve fine-grained transfor- mation, a separate model needs to be trained between each pair of ages.

GAN 은 이미지 합성에 새로운 기준을 마련했지만,
2가지 결점이 있음
1. 무수히 많은 작은 artifact 발생
2. 학습이 불안정
최근 face aging 연구들은 GAN 방식을 사용하고 있는데, 데이테셋을 두 그룹으로 나누고 (young/old) Generator 에 Yong 이미지를 넣고, discriminate 다 출력이미지를 old age 분포에 맞추게 학습시킴
이 방식에는 두가지 결점이 있음
1. 기존 GAN방식의 결점을 그대로 가져옴
2. Aging/de-aging에 제한을 가지는 방식임