FA-GANs: Facial Attractiveness Enhancement with Generative Adversarial Networks on Frontal Faces

Jingwu He, Chuan Wang, Yang Zhang, Jie Guo, and Yanwen Guo J. He, Y. Zhang, J. Guo and Y. Guo are with the National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210023, China (e-mail: [email protected]; [email protected]; [email protected]; [email protected]). C., Wang is with the Megvii Technology (e-mail: [email protected]).

Abstract

Facial attractiveness enhancement has been an interesting application of Computer Graphics and Vision over these years. It aims to generate a more attractive face via manipulations of the facial image while preserving face identity. In this paper, we propose the first Generative Adversarial Networks (GANs) for enhancing facial attractiveness in both the appearance and geometry aspects, which we call “FA-GANs”. FA-GANs contain two parallel branches, which are both GANs used to enhance facial attractiveness in two perspectives: facial appearance and facial geometry. With an embedded ranking module, the proposed FA-GANs are able to evaluate facial attractiveness and extract attractiveness features which are then further imposed on the enhancement results. This also enables us to train the networks without using paired faces as most previous methods have done. Benefited from our parallel structure, the appearance and geometry networks are able to adjust face appearance as well as face geometry independently. The consistency of outcomes of the two branches are enforced by a geometry consistency module which links the two branches naturally. To the best of our knowledge, we are the first to enhance facial attractiveness by considering both the appearance and geometry aspects using GANs under a unified framework. Experimental results show that our FA-GANs generate compelling results and outperform the state-of-the-arts.

Index Terms:

Facial attractiveness enhancement, generative adversarial networks, geometry and appearance adjustment.

I Introduction

Related researches have shown that human faces tend to make a powerful first impression and thus may have potential influence on later social behaviors. The advantages brought by attractive faces are proven in many scientific studies [1]. Consequently, growing numbers of celebrities enhance their facial attractiveness in daily life.

Refer to caption — Figure 1: Exemplar results of our FA-GANs and the mobile APP of Meitu for facial attractiveness enhancement. The original faces, geometry adjusted faces, appearance adjusted faces, Meitu adjusted faces, and our results are shown from left to right, separately.

Enhancing facial attractiveness is the process of adjusting a given face in the sense of visual aesthetics. Besides its academic purpose, it also shares a wide market in entertainment industry as a way to beautify the portrait. Facial attractiveness enhancement thus has received considerable attention in the Graphics and Vision communities.

During the last decade, the proposed facial manipulation methods can be roughly divided into two categories. The first category resorts to traditional Computer Graphics techniques which warp the input face to alter facial appearance. To enhance 2D facial image attractiveness, Leyvand et al. [2] introduce a method by learning the distances among facial key points and adjusting the facial image with multilevel free-form deformation. Li et al. [3] simulate makeup through adaptations of physically-based reflectance models. For the 3D face models with both geometry and texture information, Liao et al. [4] propose a sequence of manipulations concerning symmetry, frontal face proportion, and facial angular profile proportion. The methods in the other category generate the desired face via building the generative models by using deep learning. The representative methods are the deep generative networks which have exhibited a remarkable capability in facial image generation. BeautyGAN [5] adjusts facial appearance by transferring the makeup style from a given reference face to another non-makeup one. Though promising results are generated, this method mainly focuses on style transfer and cannot be generalized to automatic facial attractiveness enhancement easily. It is widely recognized that the geometry structure of the face plays a critical role in facial attractiveness. However, to the best of our knowledge, till now there does not exist a unified framework for improving the attractiveness involving both the appearance and geometry aspects.

The purpose of this paper is to automate facial attractiveness enhancement in a unified deep learning framework which could not only alter the facial appearance but also improve the geometry structure. A key challenge in our task is the lack of paired data which can be used for training. It is obviously very expensive to collect a large quantity of paired faces such that each pair is of the same individual before and after enhancement for supervised learning. Another challenge is that people may define facial attractiveness in different ways. Their answers vary a lot when asked to explain this concept. However, there does exist a general criterion in some ways, such that most actresses or actors are recognized by the majority for their beauty or handsomeness. Therefore, it is possible to explore the relationship between the ordinary and attractive faces to make the enhancement task tractable, even though it might be difficult to build a set of explicit, explainable, and well acknowledged rules to define facial attractiveness.

In this paper, we propose the facial attractiveness enhancement Generative Adversarial Networks, called FA-GANs, to achieve attractiveness enhancement by altering the input face in both the appearance and geometry aspects. Given the difficulty in collecting paired data, FA-GANs are trained on a dataset we collected which consists of the attractive faces and ordinay ones. The faces in the two subsets are, however, uncorrelated. To learn with unpaired faces, we specifically design a novel attractiveness ranking module which could extract attractiveness features. Facial attractiveness can thus be evaluated with implicit rules in a data-driven manner.

FA-GANs consist of two parallel branches used for the enhancement of facial geometry and appearance, respectively. Both of them are with the structure of GANs. For the geometry branch, we employ the facial landmarks to depict the geometry structure of a face. For the appearance branch, the ranking module with the structure of VGG-16 is pre-trained, and the feature maps extracted are utilized to depict the attractiveness rules. The ranking module determines the attractiveness domain that the ordinary faces should be enhanced. Then, the networks of deep face descriptor [6] are utilized by minimizing the distance of the facial images before and after enhanced. This guarantees the enhanced face and the original input can be recognized as the same individual. A newly designed geometry enhancement consistency module is proposed to combine these two branches, ensuring FA-GANs enhance the facial attractiveness in two aspects consistently.

To the best of our knowledge, we are the first to enhance the facial attractiveness with GANs by considering not only the appearance aspect but also the facial geometry. Our comprehensive experiments show that FA-GANs generate compelling perceptual results and outperform the state-of-the-arts.

In addition to the general framework, our paper also makes the following technical contributions.

•

We propose FA-GANs for facial attractiveness enhancement with a newly designed geometry enhancement consistency module to automatically enhance the input face in both geometry and appearance aspects.
•

The pre-trained attractiveness ranking module is embedded in FA-GANs to learn the features of attractive faces via unsupervised adversarial learning, which does not require the training data to be organized as paired faces.
•

FA-GANs consist of two branches each of which is able to work independently to adjust either facial appearance or geometry.

II Related Work

Facial attractiveness has been investigated for a long time in many research areas, such as computer vision and cognitive psychology. Numerous empirical experiments suggest that beauty cognition does exist over the education background, age, and gender. Facial attractiveness enhancement is closely related to facial attractiveness analysis and face generation. In this section, we first review the researches in altering facial appearance and geometry, and then pay special emphasis on the recent facial image generation methods with generative adversarial networks.

II-A Facial Appearance Enhancement

Arakawa et al. propose a system using interactive evolutionary computing which removes undesirable skin components from human facial images to make the face look beautiful [7]. Other works devote to making the enhancement via face makeup methods. Liu et al. propose a fully automatic makeover recommendation and synthesis system named beauty e-Expert [8]. Deep learning has been used and shown remarkable performance. Liu et al. propose an end-to-end deep localized makeup transfer network to automatically synthesize the makeup for female faces, achieving natural-looking results [9]. Li et al. achieve makeup transfer from a given reference makeup face to the non-makeup one in high quality with their proposed BeautyGAN [5]. Chang et al. [10] propose PairedCycleGAN which can quickly transfer the style from an arbitrary reference makeup photo to an arbitrary source photo. Most of these methods based on deep learning focus on style transfer. By contrast, our aim is to automatically enhance facial image in the aspects of not only facial appearance but also face geometry. Given a target face, we do not require an additional image as the reference.

II-B Facial Geometry Enhancement

The shapes of beautiful faces are defined in different ways by different individual groups, but beautiful faces can always be recognized to be attractive by individuals from other groups [11].

For the 2D facial image, Leyvand et al. propose a data-driven approach for facial attractiveness enhancement [2], which adjusts the geometry structure by learning from the distance vectors of the landmark points. Li et al. propose a deep face beautification framework which is able to automatically modify the geometry structure of a face to boost the attractiveness [12]. For the 3D face model, Liao et al. propose an enhancing approach by adjusting geometry symmetry, frontal face proportion, and facial angular profile proportion [4]. They apply proportion optimization to the frontal face restricted to the Neoclassical Canons and golden ratios. The combination of asymmetry correction and adjustment of frontal or profile proportions can achieve better results. However, the research also suggests that asymmetry correction is more effective than adjusting the proportions of the frontal or profile [11]. Qian et al. propose additive focal variational auto-encoder (AF-VAE) to arbitrarily manipulate high-resolution face images [13]. Different from these methods which mainly alter face geometry based on empirical rules, we explore facial attractiveness by learning from unpaird training data in a data-driven manner, considering that unanimous and measurable rules about attractiveness are difficult to define explicitly.

II-C Generative Adversarial Networks

Our facial enhancement architecture is also one kind of deep generative model derived from GANs [14]. GANs work in a simple and efficient way to train both the generator and discriminator via the min-max two-player game, achieving remarkable results on unsupervised learning tasks. Recently, GANs and their variants have shown great success in realistic image generation including super-resolution [15], image translation [16], and image synthesis [17]. We here briefly review the recent applications of GANs on image generation.

Plenty of modified architectures of GANs [18, 19, 20, 21, 22, 23, 24, 23, 25] are proposed to fulfill face generation. For the face age progression problem, Yang et al. propose a pyramid architecture of GANs to learn face age progression, achieving remarkable results [26]. Besides, the identity-preserved conditional generative adversarial networks are proposed to generate a face with target age while preserving the identity [18].

In the research area of face synthesis, Song et al. propose the geometry-guided GANs to achieve facial expression removal and synthesis [27]. Shen et al. synthesize faces while preserving identity with proposed three-player GAN called FaceID-GAN [28]. Bao et al. propose a framework based on GANs to disentangle identity and attributes of faces [24], and recombine different identities and attributes for identity preserving face synthesis in open domain. Li et al. propose an effective object completion algorithm using a deep generative model [21]. Cao et al. propose CariGANs for unpaired photo-to-caricature translation which generates the caricature image in two steps of geometry and appearance adjustment [29].

Inspired by these works, we propose our facial attractiveness enhancement framework based on GANs to generate the enhanced and identity preserved facial images automatically. To the best of our knowledge, till now none of existing works based on GANs are specifically devised to fulfill this task.

III Our Method

The original GANs consist of a generator $G$ and a discriminator $D$ . $G$ and $D$ are trained alternatively and iteratively via an adversarial process. $G$ and $D$ compete with each other via a min-max game formulated as the following Equation,

	$\displaystyle\mathcal{V}_{D,G}=$	$\displaystyle\min\limits_{G}\max\limits_{D}\mathbb{E}_{x\sim P_{data}(x)}\log[D(x)]$		(1)
		$\displaystyle+\mathbb{E}_{z\sim P_{z}(z)}\log[1-D(G(z))],$		(1)

where $z$ represents the noise sampled from a prior probability distribution $P_{z}$ , and $x$ denotes the data item sampled from real data distribution $P_{data}$ . We aim to get $G$ until $D$ cannot distinguish real samples from the generated ones.

Let X and Y denote the domains of unattractive and attractive faces respectively. No paired faces exist between these two domains. Given an unattractive sample $x\in X$ , our goal is to learn a mapping $\phi:X\to Y$ which can transfer $x$ to an attractive sample $y\in Y$ . Unlike previous works enhancing the facial attractiveness in either geometry structure or appearance texture, FA-GANs make the enhancement involving both of these two aspects. Therefore, FA-GANs consist of two parallel branches: the geometry adjustment network $\phi_{geo}$ and the appearance adjustment network $\phi_{app}$ .

The proposed framework of FA-GANs with two branches is shown in Fig. 2. Each branch basically is with the architecture of GANs. To enhance face attractiveness in both the geometry and appearance aspects, the geometry branch is trained to learn the geometry-to-geometry translation from $X$ to $Y$ . The appearance branch is trained with the help of geometry translation $\phi_{geo}$ to learn the adjustment in both of the appearance and geometry aspects. Similar to the traditional beautification engine [2] of 2D facial images, facial landmarks are used to represent the geometry structure. We denote $L_{X}$ and $L_{Y}$ as the domains of geometry structures of unattractive and attractive faces, respectively. The geometry branch learns the mapping $\phi_{geo}:L_{X}\to L_{Y}$ to convert geometry structure of the unattractive face $l_{x}\in L_{X}$ to the attractive one $l_{y}\in L_{Y}$ . The appearance branch learns to convert the instance $x$ to an intermediate instance $y’\in Y’$ , where $y’$ is an appearance adjusted face with the same geometry structure as $x$ , and $Y’$ is the intermediate domain with geometry structure of $X$ and appearance texture of $Y$ . The mapping of appearance branch learned is defined as $\phi_{app}:X\to Y’$ .

To combine these two branches and enhance the facial attractiveness in both aspects, we further minimize the energy function to ensure the consistency between the geometry structure $l_{y^{\prime}}$ of the intermediate face $y^{\prime}$ and the geometry structure $G_{geo}(x)$ generated by geometry branch. The consistency energy is defined as:

\displaystyle\mathbf{E}_{con}=\mathbb{E}_{x\in X}\|G_{geo}(l_{x})-l_{y^{\prime}}\|_{2}.

(2)

To achieve this, we further introduce the pre-trained geometry analysis networks to extract the geometry structure $l_{y’}$ from $y’$ generated by the appearance branch. FA-GANs combine the geometry mapping $\phi_{geo}:L_{X}\to L_{Y}$ and appearance mapping $\phi_{app}:X\to Y’$ and achieve the mapping $\phi:X\to Y$ by enforceing the consistency between the geometry and appearance branches. The geometry branch, appearance branch, and the combination of these two branches are described in detail in the following.

III-A Geometry Enhancement Branch

III-A1 Geometry data

We extract 2D facial landmarks for each of the training face images. For each face, $60$ landmarks are detected, and the detected landmarks are located on the outlines of left eyebrow, right eyebrow, left eye, right eye, nose bridge, nose tip, top lip, bottom lip, and chin. Fig. 3 shows an example of extracted landmarks. To normalize the scale, all faces are cropped and resized to the size of $224\times 224$ pixels. Instead of learning from facial landmarks directly, we prefer to enhance facial attractiveness via adjusting the relative distances among important facial components, such as eyes, nose, lip, and chin. Landmarks are more sensitive to small errors than relative distances. Therefore, the detected landmarks are used to construct a distance embedded mesh containing $150$ edges through Delaunay triangulation. We further reduce the dimension of the distance vector to $32$ by applying principal component analysis (PCA) and reserve $99.01\%$ of total variants.

III-A2 Geometry enhancement networks

Geometry branch aims to learn the mapping $\phi_{geo}:L_{X}\to L_{Y}$ , while no samples in $L_{X}$ and $L_{Y}$ are paired. We achieve this by constructing the geometry adjustment GANs including only full connected (FC) and ReLU [30] layers. Unlike traditional GANs which train the generator until it fools the discriminator, the feature maps of $l_{y}\in L_{Y}$ are also utilized to help to learn the geometry mapping. The generator takes $l_{x}\in R^{32}$ as input and outputs the adjusted distance vector $l_{y}\in R^{32}$ . Fig. 3 shows the architecture of our geometry adjustment branch. Apart from the last layer, the discriminator has the same architecture with the generator. The details of the architectures are shown in Table I.

TABLE I: Architecture of geometry enhancement branch.

Module	Layer	Activation size
	Input	$32$
	FC-ReLU	$64$
	FC-ReLU	$64$
	FC-ReLU	$64$
	FC-ReLU	$64$
Generator	FC	$32$
Discriminator	FC	1

Following the classic GANs [14, 31], the process of training discriminator amounts to minimizing the loss:

	$\displaystyle\mathcal{L}_{geo\_D}=$	$\displaystyle\mathbb{E}_{l_{x}\in L_{X}}[D_{geo}(G_{geo}(l_{x}))^{2}]$		(3)
	$\displaystyle+$	$\displaystyle\mathbb{E}_{l_{y}\in L_{Y}}[(1-D_{geo}(l_{y}))^{2}],$		(3)

where $D_{geo}$ and $G_{geo}$ are the discriminator and the generator, respectively. $D_{geo}$ takes both the unattractive samples and generated samples as negative samples and regards the attractive samples as positive samples. The loss of discriminator can be rewritten as:

$\displaystyle\mathcal{L}_{geo\_D}=$	$\displaystyle\mathbb{E}_{l_{x}\in L_{X}}[D_{geo}(G_{geo}(l_{x}))^{2}]$	(4)
$\displaystyle+$	$\displaystyle\mathbb{E}_{l_{x}\in L_{X}}[D_{geo}(l_{x})^{2}]$
$\displaystyle+$	$\displaystyle\mathbb{E}_{l_{y}\in L_{Y}}[(1-D_{geo}(l_{y}))^{2}].$

Considering that samples coming from the same category have similar feature maps, the feature loss is introduced and the loss of generator is defined as:

$\displaystyle\mathcal{L}_{geo\_G}=$	$\displaystyle\mathbb{E}_{l_{x}\in L_{X}}[(1-D_{geo}(G_{geo}(l_{x})))^{2}]$	(5)
$\displaystyle+$	$\displaystyle\mathbb{E}_{l_{x}\in L_{X},l_{y}\in L_{Y}}\\|fc_{i}(G_{geo}(l_{x}))-fc_{i}(y)\\|_{2}$
$\displaystyle+$	$\displaystyle\mathbb{E}_{l_{x}\in L_{X},l_{y}\in L_{Y}}\\|G_{geo}(l_{x})-l_{y}\\|_{2},$

where $fc$ is the FC layer. $i\in\{1,2,3,4\}$ indicates the i-th layer in the generator $G_{geo}$ .

III-A3 Geometry adjustment on faces

Geometry branch converts $l_{x}$ to $l_{y}$ with $l_{x}$ being the distance vector derived from the landmark points $\mathbf{p}_{x}$ . To adjust facial geometry, the enhanced points $\mathbf{p}_{y}$ corresponding to $l_{y}$ are estimated by minimizing the following energy function for the best fit,

\displaystyle E({p_{y}}_{1},...,{p_{y}}_{n})=\sum_{e_{ij}}{({\|{p_{y}}_{i}-{p_{y}}_{j}\|}_{2}-d_{ij}^{2})}^{2},

(6)

where $e_{ij}$ is the element of the connectivity matrix of our constructed facial mesh. The distance term $d_{ij}$ is the entry in $l_{y}$ corresponding to $e_{ij}$ . Minimization of the energy function is performed by the Levenberg-Marquardt [32] algorithm. Then the geometry enhanced face is generated by mapping the original face texture from the mesh constructed by $\mathbf{p}_{x}$ to the mesh of $\mathbf{p}_{y}$ .

III-B Appearance Enhancement Branch

The outstanding performance of GANs in fitting data distribution has significantly promoted many computer graphics and vision applications such as image-to-image translation [33, 16, 29, 5]. Inspired by these studies, we employ GANs to perform facial appearance enhancement while preserving identity. The appearance adjustment networks and the loss are introduced in the following.

III-B1 Appearance adjustment networks

The process of appearance enhancement only requires a forward pass through the generator $G_{app}$ . The generator $G_{app}$ is designed with the U-Net [34]. The discriminator $D_{app}$ is introduced to output the indicators $D_{app}(x)$ suggesting the probability that $x$ comes from the attractiveness category. Different from the classic discriminator of GANs, we implement our discriminator with the pyramid architecture [26] to estimate high-level attractiveness-related cues in a fine-grained way.

Specifically, a facial attractiveness classifier is pre-trained with the architecture of VGG-16 [35] to classify attractive and unattractive faces. The hierarchical layers of VGG-16 endow our network with the ability to capture image features from the pixel level to semantic level. The generator $G_{app}$ is optimized until the discriminator $D_{app}$ is confused about $G_{app}(x)$ and $y\in Y$ for all the pixel and semantic level features. Consequently, the feature maps of the 2nd, 4th, 7th, and 10th convolutional layers in VGG-16 are integrated into the discriminator $D_{app}$ for adversarial learning. The generator $G_{app}$ not only transfers $x\in X$ to $y\in Y$ but also preserves the identity. We achieve this by leveraging the network of deep face descriptor [6] $\psi_{id}$ to measure the identity similarity between $x$ and $G_{app}(x)$ . The deep face descriptor $\psi_{id}$ is trained on a large face dataset containing millions of facial images by recognizing $N=2622$ unique individuals. We remove the classification layer and take the last FC layer as the identity preserving output, forming an identity descriptor $\psi_{id}(x)\in R^{4096}$ . Both the original face $x$ and enhanced face $G_{app}(x)$ are fed into $\psi_{id}$ to generate $G_{app}(x)$ with small margin between $\psi_{id}(G_{app}(x))$ and $\psi_{id}(x)$ .

TABLE II: Convolutional layers of the distance feature extractor.

Layer	Activation size
Input	3 $\times$ 224 $\times$ 224
$32\times 9\times 9$ stride $1$ padding $4$	$32\times 224\times 224$
$64\times 3\times 3$ stride $2$ padding $1$	$64\times 112\times 112$
$128\times 3\times 3$ stride $2$ padding $1$	$128\times 56\times 56$
$128\times 3\times 3$ stride $1$ padding $1$	$128\times 56\times 56$
$128\times 3\times 3$ stride $1$ padding $1$	$128\times 56\times 56$
$128\times 3\times 3$ stride $1$ padding $1$	$128\times 56\times 56$
$64\times 3\times 3$ stride $2$ padding $1$	$64\times 28\times 28$
$32\times 3\times 3$ stride $2$ padding $1$	$32\times 14\times 14$
$16\times 3\times 3$ stride $2$ padding $1$	$16\times 7\times 7$
FC-ReLU	$784$
FC-ReLU	$512$
FC	32

III-B2 Loss

The appearance improved face should be recognized as the same individual as the original one. To this end, the loss concerning appearance enhancement should not only take effect in helping improve the facial attractiveness, but also facilitate the preservation of identity. In addition, the difference between the improved face and the input should be constrained. Considering all these aspects, four types of loss for training the appearance branch are defined. They are the adversarial loss, identity loss, pixel loss, and total variation loss.

a) Adversarial loss. Similar to the geometry adjustment branch, both unattractive faces and the generated faces are deemed as negative faces and the attractive faces are deemed as positive ones. We also adopt the adversarial loss of LSGAN [31] to train the appearance adjustment branch. The adversarial loss is defined as:

$\displaystyle\mathcal{L}_{app\_D}=$	$\displaystyle\mathbb{E}_{x\in X}[D_{app}(G_{app}(x))^{2}]$	(7)
$\displaystyle+$	$\displaystyle\mathbb{E}_{x\in X}[D_{app}(x)^{2}]$
$\displaystyle+$	$\displaystyle\mathbb{E}_{y\in Y}[(1-D_{app}(y))^{2}],$

\displaystyle\mathcal{L}_{app\_G}=

\displaystyle\mathbb{E}_{x\in X}[(1-D_{app}(G_{app}(x)))^{2}].

(8)

b) Identity loss. To ensure that appearance enhanced face $G_{app}(x)$ and the original face $x$ are recognized as the same individual, the identity loss is introduced and is defined as:

\displaystyle\mathcal{L}_{id}=\mathbb{E}_{x\in X}\|\psi_{id}(x)-\psi_{id}(G_{app}(x))\|_{2},

(9)

where $\psi_{id}$ is the identity descriptor derived from the deep face descriptor [6].

c) Pixel loss. The generated face $G_{app}(x)$ is more attractive than the input one $x$ . However, the gap between $G_{app}(x)$ and $x$ in pixels should be constrained. The pixel loss $\mathcal{L}_{px}$ enforces the enhanced face $G_{app}(x)$ to have small difference with the original face $x$ in the raw-pixel space. In addition, in experiences of other image generation tasks, $L1$ regularization performs better with less blurring than $L2$ regularization. The pixel loss is thus formulated as:

\displaystyle\mathcal{L}_{px}=\frac{1}{W\times H\times C}\sum|x-G_{app}(x)|,

(10)

where $W$ and $H$ are the width and height of the image, respectively, and $C$ is the number of channels.

d) Total variation loss. Total variation loss $\mathcal{L}_{tv}$ is defined as total variation regularizer [36] in order to encourage spatial smoothness of the enhanced face area $G_{app}(x)$ .

III-C Geometry Enhancement Consistency

Given a face $x$ as well as its distance vector $l_{x}$ , geometry enhancement branch converts $l_{x}$ to $G_{geo}(l_{x})$ , where $G_{geo}(l_{x})$ lies in the domain of $L_{Y}$ . At the same time, appearance branch converts $x$ to $G_{app}(x)$ , where $G_{app}(x)$ lies in the domain of $Y$ . As illustrated in Fig. 2, FA-GANs combine these two branches and output the enhanced face with the generator of the appearance enhancement branch. The geometry consistency exists between $G_{geo}(l_{x})$ and $l_{G_{app}(x)}$ , where $l_{G_{app}(x)}$ is the distance vector of $G_{app}(x)$ . To extract the distance vector, an extractor $E$ is pre-trained on our dataset. The extractor is constructed with convolutional and FC layers, and each convolutional layer is followed by a Batch Normalization [37] layer and a ReLU [30] layer. The detailed architecture of the extractor is shown in Table II.

The extracted vector $E(x)$ is the PCA representation of distance vectors derived from landmarks. Thus, the geometry enhancement consistency is defined with the following loss:

\displaystyle\mathcal{L}_{gec}=\mathbb{E}_{x\in X}\|G_{geo}(l_{x})-E(G_{app}(x))\|_{2}.

(11)

In summary, FA-GANs contain the losses in the appearance branch and the loss enforcing geometry enhancement consistency. Therefore, the overall training loss is expressed as,

	$\displaystyle\mathcal{L}_{G}=$	$\displaystyle\lambda_{1}\mathcal{L}_{app\_G}+\lambda_{2}\mathcal{L}_{id}+\lambda_{3}\mathcal{L}_{px}$		(12)
	$\displaystyle+$	$\displaystyle\lambda_{4}\mathcal{L}_{tv}+\lambda_{5}\mathcal{L}_{gec},$		(12)

\displaystyle\mathcal{L}_{D}=\mathcal{L}_{app\_D}.

(13)

We train $G$ and $D$ alternately until $G$ learns the desired facial attractiveness transformation and $D$ becomes a reliable estimator.

IV Experiments

In this section, we first report the face dataset used in our experiments. We then analyze the performance of the appearance and geometry branches of our FA-GANs, which corresponds to the ablation study of geometry enhancement consistency module. We finally conduct comprehensive experiments to compare FA-GANs with the state-of-the-arts, including the automatic function of facial enhancement provided by prevalent mobile APPs such as Meitu [38]. In addition to the results generated by FA-GANs, we also show the results obtained by applying our geometry branch and appearance branch successively to the testing faces and compare with relevant methods.

IV-A Datasets

Facial attractiveness analysis is fundamental to facial image enhancement. The attractiveness criterion is explored in a data-driven way in our paper. However, few datasets designed specifically for facial attractiveness analysis are publicly available. Fortunately, Liang et al. propose a diverse benchmark dataset, called SCUT-FBP5500, for the prediction of multi-paradigm facial beauty [39]. The SCUT-FBP5500 dataset contains totally 5500 frontal faces involving diverse properties, such as male/female, Asian/Caucasian, and ages. Specifically, it includes 2000 Asian females, 2000 Asian males, 750 Caucasian females, and 750 Caucasian males, mostly aged from 15 to 60 with neutral expression. The attractiveness score of each facial image is labeled with 60 different people on a five-point holistic scale, where 1 indicates the most unattractive and 5 means the most attractive. Our attractiveness learner classifies the faces into two categories of attractiveness and unattractiveness. In order to obtain a dataset with unanimous labels, the faces that have conflict scores are ruled out. Moreover, we enlarge the dataset by further collecting the portraits of famous beautiful actresses as attractive faces and the portraits of some selected ordinary people as unattractive faces. Totally, $7798$ female facial images are collected for FA-GANs. The attractiveness and unattractiveness categories contain $3702$ and $4096$ images, separately.

IV-B Implementation Details

All faces are detected and their landmark points are extracted. The geometric information extracted are fed to the geometry enhancement branch. All faces are cropped and scaled to the size of $224\times 224$ pixels as the training data of our appearance enhancement branch. To get a more robust attractiveness ranking module, we also add the color jitter, random affine, random rotation, and random horizontal flip to the facial attractiveness network. The trade-off parameters $\lambda_{1}$ , $\lambda_{2}$ , $\lambda_{3}$ , $\lambda_{4}$ , and $\lambda_{5}$ in Equation (12) are set to $10.0$ , $1.0$ , $5.0$ , $10^{-5}$ , and $10^{3}$ , separately. The Adam [41] algorithm with the learning rate of $1\times 10^{-4}$ is used to optimize FA-GANs. FA-GANs are trained on a computer with a 4.20GHz 4-core, i7-7700K Intel CPU and a GTX 1080Ti GPU, which costs around $5$ hours for $15000$ iterations with the batch size of $8$ to generate the desired results.

IV-C Branch Analysis of FA-GANs

To investigate the performance of the appearance branch and geometry branch of our FA-GANs, we perform ablation studies and analyze the effect of each branch. These two branches are the variations of GANs. In implementation, the geometry branch is pre-trained for training FA-GANs. Hence, we explore the performance of appearance branch by training it without the geometry consistency constraint. The adjustment results of geometry and appearance branches are demonstrated in Figs. 4 and 5. As seen from Fig. 4, geometry branch enhances the facial attractiveness by generating faces with small cheeks and big eyes. This suggests that people prefer smaller faces and bigger eyes in nowadays. Besides, appearance branch adjusts faces mainly on the texture instead of the geometry structure as shown in Fig. 5. It tends to generate faces with clean cheeks, red lips, and black eyes, which reveals the popular cognition of beauty. We further analyze the results of attractiveness in qualitative evaluation and time consuming in the following subsection.

IV-D Comparison with State-of-the-Arts

We verify the effectiveness of FA-GANs by comparing our generated results with the results by existing methods in geometry and appearance adjustment. Specifically, we make comparisons with the geometry adjustment method proposed in [2] and the appearance adjustment method with photo-realistic image style transfer [40]. We implement [2] with its KNN-based beautification engine. $K$ beautiful faces are searched in the domain of $L_{Y}$ , and $K$ is set to 3.The geometry structure is then adjusted as described in subsection III-A3. Photorealistic image style transfer [40] needs a face $r\in Y$ as the reference in order to transfer its style to the input face $x\in X$ . We choose $r$ by selecting the nearest face in the domain of $L_{Y}$ , which can be obtained in the implementation of [2] with $K=1$ . With the help of facial landmarks we detected, the input face $x$ and the reference face $r$ are further parsed¹¹1https://github.com/zllrunning/face-parsing.PyTorch for [40], and the exemplar parsed faces are shown in Fig. 6. Furthermore, the attractiveness enhancement results involving these two aspects are also analyzed. Given a face, we first adjust its geometry structure with [2] or our geometry branch and then adjust its appearance with our appearance branch or [40]. Moreover, we also compare FA-GANs to BeautyGAN [5] with the reference face selected for [40].

TABLE III: Identity preservation analysis on facial adjustment

original	FA-GANs	geo	[2]	app	[40]	geo $+$ app	geo $+$ [40]	[2] $+$ app	[2] $+$ [40]	Meitu	BeautyGAN
$100.00\%$	$93.91\%$	$96.34\%$	$94.67\%$	$96.07\%$	$84.27\%$	$92.95\%$	$82.55\%$	$91.38\%$	$81.57\%$	$97.73\%$	$93.46\%$

TABLE IV: Time consuming analysis on geometry adjustment.

	$x\to l_{x}$	$l_{x}\to l_{y}$	$l_{y}\to y$	total
Geo branch	$0.0779$	$0.0039$	$2.3039$	$2.3857$
[2]	$0.0791$	$0.0016$	$2.1218$	$2.2025$

TABLE V: Time consuming analysis on appearance adjustment.

	$x\to r$	face parsing	$x\to y$	total
App branch	$-$	$-$	$0.0599$	$0.0599$
[40]	$0.0003$	$0.0495$	$6.2275$	$6.2773$

TABLE VI: Time consuming analysis on both geometry and appearance adjustment.

geo $+$ app	geo $+$ [40]	[2] $+$ app	[2] $+$ [40]	FA-GANs
$2.4456$	$8.663$	$2.2624$	$8.4798$	$\mathbf{0.0552}$

Some mobile APPs are specifically designed for beautifying facial images and the automatic beautification on facial images is supported. We also make a comparison with the APP of Meitu, which is famous for beautifying the faces. Each face is adjusted by the “One-click beauty” in face beautification module of Meitu and with the default settings.

IV-D1 Qualitative evaluation

The comparison results are shown in Fig. 7, and are arranged into six groups. They are original, geometry adjustment, appearance adjustment, two-step adjustment, the state-of-the-arts, and FA-GANs, respectively.

In Fig. 7, Leyvand et al. [2] adjusts the geometry structure effectively compared to the original faces, and it also tends to generate portrait with smaller cheeks and bigger eyes. Li et al. [40] adjust the appearance mainly depending on the reference image, and the generated face looks homogeneous on the face area. BeautyGAN [5] only adjusts the facial image in the appearance aspect, and it has no effects in the geometry aspect. Moreover, BeautyGAN adjusts the facial image depends on the reference image and transfer the makeup style to the non-makeup image. Compared to BeautyGAN, “One-click beauty” of Meitu adjusts the face in both aspects. All of the instances generated by Meitu shrink the cheek slightly and have whiter skins while the colors of mouth and eyebrows are diluting. Our appearance branch promotes the original faces greatly and also generates whiter skin and has the makeup style on the mouth, eyes, and eyebrows. The geometry branch adjusts the geometry structure and tends to generate smaller cheeks and bigger eyes in a data-driven manner. FA-GANs adjust the face in both appearance and geometry aspects. The enhanced faces have whiter skin and the makeup style as appearance branch and have adjusted geometry structure as geometry branch.

IV-D2 Quantitive evaluation

To assess the results by different methods quantitatively, the facial attractiveness is assessed by Face++ API²²2https://www.faceplusplus.com/beauty/, which provides users with accurate, efficient, and reliable face-based services. It gives two scores: $f_{1}\%$ and $f_{2}\%$ indicating that males and females generally think this face is more attractive than $f_{1}\%$ and $f_{2}\%$ persons, respectively. $100$ groups of the original faces and their results are evaluated. The averaged attractiveness scores for each method are shown in Fig. 8.

All these methods get higher attractiveness scores than the original faces. The statistical results suggest that FA-GANs achieve the best performance over all these methods. The two-step adjustment method of applying our geometry branch followed by appearance branch to the input faces achieves the second best. FA-GANs promote the attractiveness score from $62.21\%$ to $70.23\%$ and $61.44\%$ to $70.70\%$ in the female and male views, respectively. It suggests that FA-GANs enhance the attractiveness effectively. Similarly, the two-step adjustment method achieves the second best, getting scores of $68.76\%$ , $68.23\%$ , respectively and outperforms BeautyGAN ( $67.72\%$ , $67.28\%$ ) and Meitu ( $66.77\%$ , $65.94\%$ ). Comparing the geometry adjustment results of our geometry branch in FA-GANs and [2], geometry branch outperforms [2] with a small margin in both assessing aspects of female and male. On the other hand, the appearance branch performs better than [40] no matter directly applying these appearance adjusting methods directly to the original faces or to the intermediate results obtained by other geometry adjusting methods. Comparing the results between appearance and geometry adjustment, appearance always achieves higher scores than geometry indicating that people can enhance their attractiveness by paying more attention to makeups than facelifts.

For identity preservation, we measure it using the cosine distance between $\psi_{id}(x)$ and $\psi_{id}(x’)$ where $x$ and $x’$ are the original face and the adjusted face, respectively, and $\psi_{id}$ is the deep face descriptor. The evaluated similarity scores are shown in Table III with respect to the listed methods in Fig. 8. As can be seen, all of the methods preserve the identity with the similarity greater than $80.00\%$ . Meitu preserves the identity best with the similarity of $97.73\%$ . Our geometry branch achieves the second best with the similarity of $96.34\%$ , and FA-GANs also performs well with the similarity of $93.91\%$ .

IV-D3 Runtime time analysis

To build an effective and efficient method for enhancing the attractiveness of faces, we further compare these methods in time consuming. All the time consuming experiments are performed on the aforementioned computer. Geometry adjustment can be recognized as three steps, which contain extracting distance vector $l_{x}$ from original face $x$ , enhancing $l_{x}$ to $l_{y}$ , and mapping $l_{y}$ to the enhanced face $y$ . The time consuming analysis of these steps is shown in Table IV. As can be seen in Table IV, Leyvand et al. [2] performs faster than our geometry branch in $0.1832$ seconds on our geometry adjustment dataset. The geometry adjustment methods spend most time on mapping $l_{y}$ to enhanced face $y$ . Appearance branch enhances the attractiveness with only a forward pass through the generator. However, Li et al. [40] adjusts the appearance with an extra reference image, and requires image parsing in order to get a better result. The time consumption of appearance adjustment is shown in Table V. It suggests that appearance branch only requires $0.0599$ seconds to make adjustment on average. At last, we further compare the time consuming in the two-step adjustment method and FA-GANs in Table VI. It demonstrates that FA-GANs is the fastest. An input face can be adjusted in only $0.0552$ seconds. Furthermore, we compare FA-GANs with BeautyGAN in adjusting an input facial image. FA-GANs also performs faster than BeautyGAN, which takes $0.0794$ seconds on average.

V Conclusions

We have presented FA-GANs, a deep end-to-end framework for automating facial attractiveness enhancement in both the geometry and appearance aspects. FA-GANs learn the implicit attractiveness rules via the pre-trained facial attractiveness ranking module and avoid training on the paired faces for which a large dataset is extremely difficult to obtain. In this way, FA-GANs enhance facial attractiveness in a data-driven manner. FA-GANs contain two branches of geometry adjustment and appearance adjustment, and both of them can enhance the attractiveness independently. FA-GANs generate compelling perceptual results and enhance facial attractiveness both effectively and efficiently. These are verified by our comprehensive experiments and thorough analysis, which also demonstrate that FA-GANs achieve superior performance over existing geometry and appearance enhancement methods.

Although we have shown the superiority of FA-GANs, there still exist several aspects that need to be improved in the future. FA-GANs are limited to the frontal faces, and this is because few faces of other poses are collected in our dataset. And these faces are much more difficult to collect than frontal faces. A possible solution is that estimating the facial pose and adjusting the facial appearance and geometry in 3D face space. The attractiveness of the adjusted faces in other poses can also be evaluated. Another possible solution is that enlarging the facial attractiveness dataset and training facial attractiveness enhancement networks involving lots of faces with variant poses.

References

[1] D. I. Perrett, K. J. Lee, I. Penton-Voak, D. Rowland, S. Yoshikawa, D. M. Burt, S. Henzi, D. L. Castles, and S. Akamatsu, “Effects of sexual dimorphism on facial attractiveness,” Nature, vol. 394, no. 6696, p. 884, 1998.
[2] T. Leyvand, D. Cohen-Or, G. Dror, and D. Lischinski, “Data-driven enhancement of facial attractiveness,” in SIGGRAPH, 2008, pp. 38:1–38:9.
[3] C. Li, K. Zhou, and S. Lin, “Simulating makeup through physics-based manipulation of intrinsic image layers,” in CVPR, June 2015.
[4] Q. Liao, X. Jin, and W. Zeng, “Enhancing the symmetry and proportion of 3d face geometry,” TVCG, vol. 18, no. 10, pp. 1704–1716, Oct 2012.
[5] T. Li, R. Qian, C. Dong, S. Liu, Q. Yan, W. Zhu, and L. Lin, “Beautygan: Instance-level facial makeup transfer with deep generative adversarial network,” in Proceedings of the 26th ACM International Conference on Multimedia, ser. MM ’18. New York, NY, USA: ACM, 2018, pp. 645–653.
[6] P. Omkar M., V. Andrea, and Z. Andrew, “Deep face recognition,” in BMVC, X. Xianghua, J. Mark W., and G. K. L. Tam, Eds., September 2015, pp. 41.1–41.12.
[7] K. Arakawa and K. Nomoto, “A system for beautifying face images using interactive evolutionary computing,” in ISPACS, Dec 2005, pp. 9–12.
[8] L. Liu, H. Xu, J. Xing, S. Liu, X. Zhou, and S. Yan, “”wow! you are so beautiful today!”,” in Proceedings of the 21st ACM International Conference on Multimedia, ser. MM ’13. New York, NY, USA: ACM, 2013, pp. 3–12.
[9] S. Liu, X. Ou, R. Qian, W. Wang, and X. Cao, “Makeup like a superstar: Deep localized makeup transfer network,” in IJCAI, 2016, pp. 2568–2575.
[10] H. Chang, J. Lu, F. Yu, and A. Finkelstein, “Pairedcyclegan: Asymmetric style transfer for applying and removing makeup,” in CVPR, June 2018.
[11] A. Laurentini and A. Bottino, “Computer analysis of face beauty: A survey,” CVIU, vol. 125, pp. 184 – 199, 2014.
[12] J. Li, C. Xiong, L. Liu, X. Shu, and S. Yan, “Deep face beautification,” in MM, 2015, pp. 793–794.
[13] S. Qian, K.-Y. Lin, W. Wu, Y. Liu, Q. Wang, F. Shen, C. Qian, and R. He, “Make a face: Towards arbitrary high fidelity face manipulation,” in The IEEE International Conference on Computer Vision (ICCV), October 2019.
[14] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” in NIPS, 2014, pp. 2672–2680.
[15] C. Ledig, L. Theis, F. Huszár, J. Caballero, A. Cunningham, A. Acosta, A. Aitken, A. Tejani, J. Totz, Z. Wang, and W. Shi, “Photo-realistic single image super-resolution using a generative adversarial network,” in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017.
[16] J. Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired image-to-image translation using cycle-consistent adversarial networks,” in 2017 IEEE International Conference on Computer Vision (ICCV), Oct 2017, pp. 2242–2251.
[17] H. Zhang, T. Xu, H. Li, S. Zhang, X. Wang, X. Huang, and D. Metaxas, “Stackgan: Text to photo-realistic image synthesis with stacked generative adversarial networks,” in 2017 IEEE International Conference on Computer Vision (ICCV), Oct 2017, pp. 5908–5916.
[18] Z. Wang, X. Tang, W. Luo, and S. Gao, “Face aging with identity-preserved conditional generative adversarial networks,” in CVPR, June 2018.
[19] A. Bulat and G. Tzimiropoulos, “Super-fan: Integrated facial landmark localization and super-resolution of real-world low resolution faces in arbitrary poses with gans,” in CVPR, June 2018.
[20] J. Kossaifi, L. Tran, Y. Panagakis, and M. Pantic, “Gagan: Geometry-aware generative adversarial networks,” in CVPR, June 2018.
[21] Y. Li, S. Liu, J. Yang, and M.-H. Yang, “Generative face completion,” in CVPR, July 2017.
[22] R. Yi, Y.-J. Liu, Y.-K. Lai, and P. L. Rosin, “Apdrawinggan: Generating artistic portrait drawings from face photos with hierarchical gans,” in CVPR, June 2019.
[23] T. Karras, S. Laine, and T. Aila, “A style-based generator architecture for generative adversarial networks,” in CVPR, June 2019.
[24] J. Bao, D. Chen, F. Wen, H. Li, and G. Hua, “Towards open-set identity preserving face synthesis,” in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018.
[25] X. Yin, X. Yu, K. Sohn, X. Liu, and M. Chandraker, “Towards large-pose face frontalization in the wild,” in The IEEE International Conference on Computer Vision (ICCV), Oct 2017.
[26] H. Yang, D. Huang, Y. Wang, and A. K. Jain, “Learning face age progression: A pyramid architecture of gans,” in CVPR, June 2018.
[27] L. Song, Z. Lu, R. He, Z. Sun, and T. Tan, “Geometry guided adversarial facial expression synthesis,” in MM, 2018, pp. 627–635.
[28] Y. Shen, P. Luo, J. Yan, X. Wang, and X. Tang, “Faceid-gan: Learning a symmetry three-player gan for identity-preserving face synthesis,” in CVPR, June 2018.
[29] K. Cao, J. Liao, and L. Yuan, “Carigans: Unpaired photo-to-caricature translation,” TOG, vol. 37, no. 6, pp. 244:1–244:14, Dec. 2018.
[30] V. Nair and G. E. Hinton, “Rectified linear units improve restricted boltzmann machines,” in ICML, 2010, pp. 807–814.
[31] X. Mao, Q. Li, H. Xie, R. Y. Lau, Z. Wang, and S. Paul Smolley, “Least squares generative adversarial networks,” in ICCV, Oct 2017.
[32] J. J. Moré, “The levenberg-marquardt algorithm: implementation and theory,” in Numerical analysis, 1978, pp. 105–116.
[33] P. Isola, J. Zhu, T. Zhou, and A. A. Efros, “Image-to-image translation with conditional adversarial networks,” in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017.
[34] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015, N. Navab, J. Hornegger, W. M. Wells, and A. F. Frangi, Eds. Cham: Springer International Publishing, 2015, pp. 234–241.
[35] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” CoRR, 2014.
[36] H. A. Aly and E. Dubois, “Image up-sampling using total-variation regularization with a new observation model,” TIP, vol. 14, no. 10, pp. 1647–1659, Oct 2005.
[37] S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” in ICML, volume 37, ser. ICML’15, 2015, pp. 448–456.
[38] “Meitu,” https://corp.meitu.com/en/.
[39] L. Liang, L. Lin, L. Jin, D. Xie, and M. Li, “Scut-fbp5500: A diverse benchmark dataset for multi-paradigm facial beauty prediction,” 2018.
[40] Y. Li, M.-Y. Liu, X. Li, M.-H. Yang, and J. Kautz, “A closed-form solution to photorealistic image stylization,” in ECCV, September 2018.
[41] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.