This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Effect of Input Noise Dimension in GANs

Padala Manisha [email protected] International Institute of Information Technology, Hyderabad Debojit Das [email protected] International Institute of Information Technology, Hyderabad Sujit Gujar [email protected] International Institute of Information Technology, Hyderabad
Abstract

Generative Adversarial Networks (GANs) are by far the most successful generative models. Learning the transformation which maps a low dimensional input noise to the data distribution forms the foundation for GANs. Although they have been applied in various domains, they are prone to certain challenges like mode collapse and unstable training. To overcome the challenges, researchers have proposed novel loss functions, architectures, and optimization methods. In our work here, unlike the previous approaches, we focus on the input noise and its role in the generation.

We aim to quantitatively and qualitatively study the effect of the dimension of the input noise on the performance of GANs. For quantitative measures, typically Fréchet Inception Distance (FID) and Inception Score (IS) are used as performance measure on image data-sets. We compare the FID and IS values for DCGAN and WGAN-GP. We use three different image data-sets – each consisting of different levels of complexity. Through our experiments, we show that the right dimension of input noise for optimal results depends on the data-set and architecture used. We also observe that the state of the art performance measures does not provide enough useful insights. Hence we conclude that we need further theoretical analysis for understanding the relationship between the low dimensional distribution and the generated images. We also require better performance measures.

Refer to caption
Figure 1: Input Noise to Generated Images [18]

1 Introduction

Generative Adversarial Networks (GANs) are generative models, which learn data distribution and generate samples similar to the real data. These models are useful in learning representations of data without supervision. Deep neural networks are extensively being used for discriminative, retrieval, and clustering tasks. These networks perform well only with a large amount of labeled data. To avoid manually labeling the data, we can transfer the learned representations from the generative models to improve performance for the above tasks. Besides, we can use generative models for simple data augmentation. GANs have enormous applications in a variety of fields. In vision, GANs are used for super-resolution of images [18, 24], transferring domain knowledge from images of one domain to another [41, 19, 38], object detection [25], image editing [36], medical images [10]. In reinforcement learning, GANs are used for generating an artificial environment. One can perform semi-supervised learning using GANs when labels are missing. Other applications include generation of music [9], paintings [26] and text [39, 35]. These are but a few of the applications that developed recently.

In a typical GAN set-up, we provide a low dimensional random noise vector as an input, which it maps to a specific data sample having a much higher dimension. Hence given a trained GAN model and simple noise distribution, we can sample from the data distribution without knowing the actual distribution itself. GANs can be considered as an experimental success. The novelty and simplicity in its set-up contribute to its popularity. GANs lack of a satisfying theoretical explanation has piqued the interest of the research community. We can consider that GAN learns a lower-dimensional representation of the high dimensional data input and its mapping from lower-dimensional to real-like data sample; whenever we provide a random noise as input. According to the manifold hypothesis, the real-world high dimensional data lie on low dimensional manifolds embedded in high dimensional space [23]. Hence, we believe that the data can have an efficient lower-dimensional representation. Given access to high dimensional distribution of the data, we could efficiently perform dimensionality reduction using existing methods such as PCA [11], JL transform [17], or deep auto-encoders. However, the main challenge is that we do not know the higher dimensional distribution of the data, and hence mapping becomes difficult. The GAN set-up provides us with an innovative way of mapping the low dimensional noise to the data.

In the vanilla GAN [12], the model primarily has two networks, the generator and the discriminator. The generator, as its name suggests, is responsible for generating samples that look like real data from a lower-dimensional input noise. The discriminator is used to classify the images generated by the generator and the actual images. While training, the discriminator is trained to improve its ability to discriminate real vs. fake images while the generator is trained to fool the discriminator. The set-up is similar to a two-player zero-sum game, and at equilibrium, the generator samples data points from the data distribution. That is, it learns the data distribution.

Despite the success, GANs face major issues such as mode collapse . As a result of mode collapse, the generator maps multiple noise variables to just one data point. Although the images generated are very sharp and realistic, they do not have large diversity, which we expect in real images. The other issue is that the training is not smooth and may not converge sometimes. There has been quite a lot of research to overcome mode collapse and stabilize the training. Researchers have proposed new losses that guarantee better convergence, new architectures, and new optimizers for overcoming the above challenges. WGAN [2], is the most popular modification over the vanilla GAN which tries to address the challenges faced by changing the loss function. Another major challenge for generative models, in general, is qualitatively estimating their performance. We need to evaluate the quality as well as the diversity of the samples generated. For our analysis we use two measures that are widely used namely, Fréchet Inception Distance (FID) [14] and Inception Score (IS) [34]. The survey [5] provides a detailed analysis of the existing measures.

There have also been papers that try to develop a theoretical framework to analyze the models better. In [28], there are further details about the issues and novel approaches. To best of our knowledge, there is no prior work that discusses the effect of the dimension of the input noise on the GAN performance. Considering that the model aims at mapping the input noise to the data, it is crucial to find the effect of changing the input noise on the performance of the model.

Our contribution: In this work, we focus on studying the effect of noise on the performance of the model. We vary the dimension of the input noise and its distribution and study its effect on the samples generated for three different data-sets i) Gaussian Data ii) MNIST digits data [22] iii) CelebA face data [27] (a) 32x32 and (b) 64x64. We use two different kinds of GANs i) DCGAN [33] ii) WGAN-GP [13]. We provide the following results,

  1. 1.

    Quantitative estimation by comparing the FID and IS for the generated samples.

  2. 2.

    Qualitative estimation: by comparing the samples of images generated after the training converges.

We believe that such an analysis would lead not only to the best set of parameters but also to provide useful insights into the working of the model.

Organization: We discuss the related work in Section 2, which is followed by the preliminaries in Section 3, where we discuss the GAN models and performance measures used in detail. Next, we describe the experimental set-up and compare the result in Section 4. Finally, we discuss the insights derived from the experiments in Section 5 before concluding in Section 6.

2 Related Work

Restricted Boltzmann Machines (RBMs) [16] , Deep Belief Networks (DBNs) [15], Variational Autoencoders (VAEs) [20] are generative models popular before GANs. They have an elegant theory, but they fail to produce complex images when trained on other data-sets such as CIFAR, SVHN, etc. GANs generate images that are sharper and realistic. DCGAN [33] introduced changes in the architecture and hyperparameters that stabilized the training of vanilla GANs on images to a large extent. Vanilla GANs suffer from the issues of vanishing gradients, as discussed in [1]. In [2], the authors propose Wasserstein GAN (WGAN) to overcome the issue. In [13], the authors propose WGAN with gradient penalty (WGAN-GP) for further stability.

Researchers have followed various approaches to resolve the challenges in GANS. Some approaches propose new loss functions[29, 32, 4], reforms in architecture [40, 31], and changes in the optimizer [30, 7]. There are other papers [14, 21, 3] which provide rigorous theoretical analysis. For further details, one can refer to the survey [28], which briefly discusses the reforms introduced by papers to solve the issues discussed.

In this work, our focus is primarily on the latent input noise, and we study its effect on the generated samples. Few papers modify the input noise and introduce a set of latent variables to control the generated images. The authors in [6] made the latent variable represent some visual concepts in an unsupervised manner, and with supervision in [8, 37]. To the best of our knowledge, no paper rigorously explores this aspect. In the next section, we describe the models and performance measures.

3 Preliminaries

In this section, we introduce the basic notations required. We also describe the two models and performance measures that we use. The main components of a GAN set-up include,

  1. a.

    architecture which includes the discriminator DD parameterized by θ\theta and generator GG parameterized by ϕ\phi,

  2. b.

    loss function denoted by V(Dθ,Gϕ)V(D_{\theta},G_{\phi}),

  3. c.

    optimizer.

We assume that the data distribution is pdp_{d}, and the noise is sampled from another distribution pz(z)p_{z}(z). The generated samples follow the distribution pgp_{g}, which is referred to as the model distribution. In a typical architecture model, we sample zpzz\sim p_{z} and feed it to the GG. zz is usually low dimensional and follows a simple distribution. When dealing with images, GG is a convolutional multi-layered network that takes the zz and generates a vector x^\hat{x}, which has the same dimension as the data xx. x^pg\hat{x}\sim p_{g} is the distribution learnt by GG. The weights/parameters of GG are denoted by ϕ\phi. The other network is the DD, parameterized by θ\theta. It takes either x^\hat{x} or xx as input and outputs a single value. The value is a score for the input; a higher score indicates that the image is likely to be sampled from pdp_{d} and not pgp_{g}. The main challenge is to construct a suitable loss and optimize over the loss for achieving our objective of generating realistic images. In this paper, we consider two different kinds of loss i) DCGAN ii) WGAN-GP as further elaborated below,

3.1 DCGAN

DCGAN is a modification of vanilla GAN as introduced in [12]; the authors set the problem as a two-player zero-sum game. DD minimizes the classification loss such that it can classify the samples from pdp_{d} and pgp_{g} differently. While GG tries to maximize the same loss or generate samples to fool the DD. The objective is a simple binary cross-entropy loss used for 2 class classification given below,

minϕmax𝜃VG(Dθ,Gϕ)=𝔼xpd(x)[logDθ(x)]+𝔼zpz(z)[log(1Dθ(Gϕ(z)))]\begin{split}\underset{\phi}{min}\ \underset{\theta}{max}\ V_{G}(D_{\theta},G_{\phi})=\ &\mathbb{E}_{x\sim p_{d}(x)}[logD_{\theta}(x)]+\\ &\mathbb{E}_{z\sim p_{z}(z)}[log(1-D_{\theta}(G_{\phi}(z)))]\end{split}{} (1)

Samples from pdp_{d} are given label 1 and x^pg\hat{x}\sim p_{g} are given 0. The above objective is equivalent to minimizing the Jenson Shannon Divergence (JSD) between pdp_{d} and pgp_{g}.

Early in training, when the samples generated by GG are very noisy, the discriminator can classify with high confidence, causing the gradients w.r.t. ϕ\phi to be very small. Hence, there is no learning signal for GG to improve; hence the authors propose to use the following loss for GG,

maxϕlog(Dθ(Gϕ(z)))\underset{{\phi}}{max}\ \log(D_{\theta}(G_{\phi}(z))) (2)

Given the loss, it is challenging to balance the optimization of both GG and DD. The optimal solution lies at the saddle point denoted by (ϕ,θ)(\phi^{*},\theta^{*}). Using simultaneous gradient descent, the authors prove the convergence of the loss under specific assumptions. In this method, ϕ\phi is fixed and one step gradient descent is performed over θ\theta for Equation 1. Then θ\theta is fixed and one step gradient descent is performed over ϕ\phi for Equation 2. The authors make the assumptions of the infinite capacity of GG and DD. They also assume that DD is trained to convergence at every iteration.

3.2 WGAN-GP

In [1], the authors introduce the issue of vanishing gradient. If both pgp_{g} and pdp_{d} lie on different manifolds, the discriminator is easily able to achieve zero loss and the gradients w.r.t. GG vanish, leading to the vanishing gradient problem. The authors also prove that using Equation 2 leads to mode collapse and unstable updates. Hence, in [2], the authors propose to use Wasserstein distance between the distributions. The loss is given as follows,

maxθminϕVW(Dθ,Gϕ)=𝔼xpd(x)[Dθ(x)]𝔼zpz(z)[Dθ(Gϕ(z))]\begin{split}\max_{\theta}\min_{\phi}\ V_{W}(D_{\theta},G_{\phi})=&\mathbb{E}_{x\sim p_{d}(x)}[D_{\theta}(x)]-\\ &\mathbb{E}_{z\sim p_{z}(z)}[D_{\theta}(G_{\phi}(z))]\end{split}{} (3)

The DD has to be 11-Lipschitz w.r.t. θ\theta. In order to enforce that, the authors clamp θ\theta to be within a specified range. Using this objective, the training is more stable and less sensitive to the optimization.

In [13], the authors show that clamping weights like in WGAN leads to the problem of exploding and vanishing gradients; hence they introduce a more elegant way for enforcing Lipschitz constraint on DD. They introduce a gradient penalty term (GP) in Equation 3 to form the WGAN-GP loss as follows,

VW(Dθ,Gϕ)=𝔼xpd(x)[Dθ(x)]\displaystyle V_{W}(D_{\theta},G_{\phi})=\mathbb{E}_{x\sim p_{d}(x)}[D_{\theta}(x)] 𝔼zpz(z)[Dθ(Gϕ(z))]\displaystyle-\mathbb{E}_{z\sim p_{z}(z)}[D_{\theta}(G_{\phi}(z))] (4)
+λ𝔼x~px~[(x~D(x~)21)2]\displaystyle+\lambda\mathbb{E}_{\tilde{x}\sim p_{\tilde{x}}}[(\parallel\nabla_{\tilde{x}}D(\tilde{x})\parallel_{2}-1)^{2}]

We obtain x~px~\tilde{x}\sim p_{\tilde{x}} by sampling uniformly along straight lines between pairs of points sampled from the data distribution pdp_{d} and the generator distribution pg.p_{g}. The optimization method used is similar to GAN. Given the two different models used, we next state the measures that we use to evaluate quantitatively the performance of the samples generated.

3.3 Measures to Evaluate GANs

In general, it is not possible to compute how close pgp_{g} is to pdp_{d} quantitatively, given that GANs do not provide the distribution explicitly. For the synthetic Gaussian data, we compute the empirical distance between pgp_{g} and pdp_{d} using the three measures, i) JSD ii) Fréchet Distance (FD). The definitions of these measures are as follows,

Definition 1 (JSD)

The Jenson Shannon Divergence is a symmetric distance metric between the two distribution pd(x),pg(x)p_{d}(x),p_{g}(x) given by,

JSD(pdpg)=12KL(pdpd+pg2)+12KL(pgpd+pg2)JSD(p_{d}\parallel p_{g})=\frac{1}{2}KL\left(p_{d}\parallel\frac{p_{d}+p_{g}}{2}\right)+\frac{1}{2}KL\left(p_{g}\parallel\frac{p_{d}+p_{g}}{2}\right) (5)

where KLKL is the Kl-divergence.

Definition 2 (FD)

Given pgp_{g} and pdp_{d} both multivariate continuous Gaussian distributions. The mean and variance of pgp_{g} is μg,Σg\mu_{g},\Sigma_{g} and pdp_{d} is μd,Σd\mu_{d},\Sigma_{d} respectively. The FD is then,

μgμd22+Tr(Σg+Σd2(ΣgΣd)12)\parallel\mu_{g}-\mu_{d}\parallel^{2}_{2}+Tr(\Sigma_{g}+\Sigma_{d}-2(\Sigma_{g}\Sigma_{d})^{\frac{1}{2}}) (6)

In our experiments on MNIST and CelebA datasets, it is not possible to use the above measures; hence, we use the following standard performance measures,

3.3.1 Inception Score (IS)

This is the most widely used score and was proposed by [34]. The pre-trained neural network named Inception Net is used. The score function defined measures the average distance between the label distribution p(y|x)p(y|x) and marginal distribution p(y)p(y). Here the xx is the image generated by GG, and yy is the label given by the Inception Net. The distribution p(y|x)p(y|x) needs to have less entropy, which indicates that the network can classify xx with high confidence hence more likely to be a realistic image. At the same time, p(y)p(y) needs to have high entropy to indicate diversity in the samples generated. Hence, the higher the inception score, the better is the generator.

3.3.2 Fréchect Inception Distance (FID)

Proposed by [14] also uses a pre-trained Inception Net. The activations of the intermediate pooling layer serve as our feature embeddings. It is assumed that these embeddings follow a multivariate continuous Gaussian distribution. We pass multiple samples of xpdx\sim p_{d} and calculate the empirical mean and variance of their embeddings. Similarly, we sample pgp_{g} and calculate the empirical mean and variance. The FD, given by Equation 6, is applied over the mean and variance of the two Gaussian embeddings. If the pgp_{g} is close to pdp_{d}, the FD will be low. Hence lower the score, the better is the generator.

Apart from quantitative evaluation, we also provide the samples of images generated for each type of input noise, for visual comparison.

4 Experiments with Input Noise for GANs

In this section, we empirically study the effect of input noise on GANs.

4.1 Experimental Set-Up

We used two different GANs in our experiments: (a) DCGAN and (b) WGAN-GP. We considered the following three data-sets for training and evaluation:

  • Synthetic data-set: data generated with Gaussian distribution and/or mixture of Gaussian distribution

  • The MNIST data-set

  • CelebA (Celeb Faces Attribute A)

We studied the effect of dimension of input noise vector zz, on FID and IS for all three data-sets and the both GANs. For quantitative comparison, we plot measures FID and IS against the dimension of the noise. For qualitative comparison, we looked at the original distributions (or images) and generated distributions (or images).

Below we describe the data-sets and the corresponding architectures used.

4.2 Data-Sets and Architectures

For a specific data-set, we have used a specific architecture. We keep the same architecture for both GAN and WGAN-GP losses as described by Equation 1 and Equation 3 respectively.

Synthetic Data-set

We show results on 1-dimensional synthetic Gaussian data. This enables us to compare the plots for the distribution of the real data samples and generated samples. We study the effect of varying the variance of the Gaussian. We also vary the modes in the Gaussian (i.e., have two peaks) and visualize the problem of mode collapse).

Architecture: (vanilla GAN architecture) We use a simple feed-forward multi-layered perceptron having two hidden layers each. Both the generator and discriminator have ReLU activation.

MNIST Data-set

[22] the data-set consists of 28×2828\times 28 dimensional black and white handwritten digits.

Architecture: The discriminator has two convolution layers with 64 and 128 filters and stride 2 and leaky ReLU activation. There is a final dense layer to return a single value of probability. Generator has an architecture reverse to that of discriminator’s but includes batch normalization layers after every transpose convolution layer. The activation used is ReLU.

CelebA Data-set

[27] this data-set consists of more than 200K celebrity images, each of 128×128128\times 128 resolution. We re-scale the images to 32×3232\times 32 and 64×6464\times 64, and train two different models for each of the resolutions.

Architecture: Here, the architecture is similar to that of MNIST, although the networks for 32×3232\times 32 have 3 convolutional layers with 64, 128 and 256 filters and batch normalization layers even in the discriminator. For generating images of 64×6464\times 64, we include one extra convolutional layer which has 512 filters.

4.3 Results for Synthetic Data-set

In Figure 2 (a) and (b), we present the FD values (Equation 6) between the real and generated Gaussian data, i) the data has single mode and ii) bimodal data. It is observed that the variance in the data does not effect the trend in performance. The FD values stay low till the dimension of input noise is 10. Increasing the dimension only worsens the performance. And increasing it further from 20, for a fixed generator and discriminator architecture blows up the distance. The problem of mode collapse is also evident as we see the FD values for bimodal is greater than the values in the unimodal case. From Figure 3, we can visually observe that the real data and generated data distributions are nearby with input noise dimension is 10. The distributions are far apart when the dimension is increased to around 20.

In Figure 2 (c) and (d), we compare various performance measures, FD and JSD for WGAN-GP. We find that, there is no particular trend although there is small difference in the measures as the dimension increases and for all the variance and modes. We also observe that, the values are smaller than GAN. Hence, we conclude that WGAN performs better than the normal GAN on this data-set. This is also visible from the distribution plots in Figure 4. The problem of mode collapse is still not completely overcome even in WGAN-GP.

Refer to caption
(a) FD for Unimodal Gaussian Distribution in vanilla GAN
Refer to caption
(b) FD for Bimodal Gaussian Distribution in vanilla GAN
Refer to caption
(c) FD Plot for WGAN
Refer to caption
(d) JSD Plot for WGAN
Figure 2: Performance measure Plots
Refer to caption
(a) Unimodal Gaussian Distribution with Variance = 4 and dimension of noise = 10
Refer to caption
(b) Unimodal Gaussian Distribution with Variance = 1 and dimension of noise = 18
Refer to caption
(c) Bimodal Gaussian Distribution with dimension of noise = 10
Refer to caption
(d) Bimodal Gaussian Distribution with dimension of noise = 19
Figure 3: Distributions generated using vanilla GAN
Refer to caption
(a) Unimodal Gaussian Distribution with Variance = 2 and dimension of noise = 10
Refer to caption
(b) Unimodal Gaussian Distribution with Variance = 8 and dimension of noise = 20
Refer to caption
(c) Unimodal Gaussian Distribution with Variance = 10 and dimension of noise = 6
Refer to caption
(d) Bimodal Gaussian Distribution with Variance = 4 and dimension of noise = 10
Figure 4: Distributions generated using WGAN-GP
Refer to caption
(a) FID Plot for DCGAN
Refer to caption
(b) IS Plot for DCGAN
Refer to caption
(c) FID Plot for WGAN-GP
Refer to caption
(d) IS Plot for WGAN-GP
Figure 5: Performance Measure Plots
Refer to caption
(a) Dimension of noise = 2
Refer to caption
(b) Dimension of noise = 10
Refer to caption
(c) Dimension of noise = 100
Refer to caption
(d) Dimension of noise = 1000
Figure 6: Images Generated by DCGAN
Refer to caption
(a) Dimension of noise = 2
Refer to caption
(b) Dimension of noise = 10
Refer to caption
(c) Dimension of noise = 100
Refer to caption
(d) Dimension of noise = 1000
Figure 7: Images Generated by WGAN-GP

4.4 Results for MNIST

In Figure 5 (a) and (c), we compare the FID values for different dimensions of input noise for DCGAN and WGAN-GP. We fix the architecture of the generator and the discriminator and we find that the FID scores are very high for noise dimension 2 but do not change much for higher dimensions till 1000. At the same time, we find the FID values for WGAN-GP are much better than DCGAN. In Figure 5 (b) and (d), we plot the IS values which are evaluated for batches of samples and the mean and the variance of the IS values across the batches is plotted for both DCGAN and WGAN-GP. We observe a similar trend as the FID.

We visually compare the results in Figures 6, 7. We find that results are bad when noise dimension is 2 compared to the other dimensions. WGAN-GP performs worse compared to DCGAN at lower-dimensional input noise. Hence we conclude that having dimension of noise as 10 is sufficient for good performance.

4.5 Results for CelebA 32

In Figure 8 (a) and (c), we compare the FID and IS values for generating CelebA images for different dimensions of input noise for DCGAN and WGAN-GP. We fix the architecture of the generator and the discriminator and we find that the FID scores are very high noise dimension 2 and then reduce drastically. Further increasing the dimension does not effect the FID values much. Both WGAN and DCGAN perform almost equally. Although WGAN is slightly better. In Figure 8 (b) and (d), we plot the IS values which are evaluated for batches of samples and the mean and the variance of the IS values across the batches is plotted for both DCGAN and WGAN-GP. Figures 9, 10. We find that results when noise dimension is 2 is bad compared to the other dimensions. For dimension 2 and 10 the images generated using WGAN lack clarity while that of DCGAN lack variety. For dimension 100 and 900, visually their performance is similar.

Refer to caption
(a) FID Plot
Refer to caption
(b) IS Plot
Refer to caption
(c) FID Plot
Refer to caption
(d) IS Plot
Figure 8: Performance Measure Plots for 32x32 CelebA
Refer to caption
(a) Dimension of noise = 2
Refer to caption
(b) Dimension of noise = 10
Refer to caption
(c) Dimension of noise = 100
Refer to caption
(d) Dimension of noise = 900
Figure 9: Images Generated by DCGAN for 32x32 CelebA
Refer to caption
(a) Dimension of noise = 2
Refer to caption
(b) Dimension of noise = 10
Refer to caption
(c) Dimension of noise = 100
Refer to caption
(d) Dimension of noise = 900
Figure 10: Images Generated by WGAN-GP for 32x32 CelebA

4.6 Results for CelebA 64

In Figure 11 (a) and (c), we compare the FID and IS scores for generating CelebA images for different dimensions of input noise for DCGAN and WGAN-GP. We fix the architecture of the generator and the discriminator and we find that the FID scores are very high for very low noise dimension but decreases considerably after a threshold and then do not change much for higher dimensions. We find the FID values for WGAN-GP are way better as also indicated by the figure. In Figure 11 (b) and (d), we plot the IS values which are evaluated for batches of samples and the mean and the variance of the IS values across the batches is plotted for both DCGAN and WGAN-GP. The IS values do not seem to be indicative of the results as much as the FID values. WGAN-GP performs better at celebaA 64 than celebA 32 while the opposite is true for DCGAN.

We visually compare the results in Figures 12, 13. We find that results when noise dimension is 2 is bad compared to the other dimensions. WGAN-GP performs better compared to DCGAN. The clarity as well as variety of images generated by WGAN-GP is better compared to DCGAN unlike the case for images of size 32.

Refer to caption
(a) FID Plot
Refer to caption
(b) IS Plot
Refer to caption
(c) FID Plot
Refer to caption
(d) IS Plot
Figure 11: Performance Measure Plots for 64x64 CelebA
Refer to caption
(a) Dimension of noise = 2
Refer to caption
(b) Dimension of noise = 10
Refer to caption
(c) Dimension of noise = 100
Refer to caption
(d) Dimension of noise = 500

fo

Figure 12: Images Generated by DCGAN for 64x64 CelebA
Refer to caption
(a) Dimension of noise = 2
Refer to caption
(b) Dimension of noise = 10
Refer to caption
(c) Dimension of noise = 100
Refer to caption
(d) Dimension of noise = 500
Figure 13: Images Generated by WGAN-GP for 64x64 CelebA

5 Discussion and Future Work

In the various recent analysis on GANs, there has been hardly any focus on the input noise. The transformation of input noise to the generated data is a crucial part of the generation process. We think it calls for more attention and further analysis. From the results above, we can see that, there is a significant effect on the results when the input noise dimension is changed. It is also observed that the optimal noise dimension depends on the data-set and loss function used.

Given that we intend to map the high dimensional data to a low dimensional distribution, we would like the dimension of noise to be as small as possible. Although very small values do not give good results, hence we find an optimal dimension after which the model does not perform better in case of CelebA and MNIST or performs worse in case of Gaussian data.

FID value is not so indicative for analyzing the effect of change in input noise for the CelebA and MNIST data-set. We believe a more theoretical study and further analysis will help in training hand faster and better results than starting with a random size of z. We may also need to come up with performance measures which are more indicative of the quality of generated images.

6 Conclusion

We studied the effect of changing input noise for GANs quantitatively and qualitatively. We conclude that the input noise dimension has a significant effect on the generation of images. To obtain useful and quality data generation, the input noise dimension needs to be set based on data-set and which GAN architecture used. We leave the theoretical analysis of the relation between the low dimensional distribution and the high dimensional data for the future work.

References

  • [1] Arjovsky, M., and Bottou, L. Towards Principled Methods for Training Generative Adversarial Networks. ArXiv e-prints (Jan. 2017).
  • [2] Arjovsky, M., Chintala, S., and Bottou, L. Wasserstein GAN. ArXiv e-prints (Jan. 2017).
  • [3] Arora, S., Ge, R., Liang, Y., Ma, T., and Zhang, Y. Generalization and Equilibrium in Generative Adversarial Nets (GANs). ArXiv e-prints (Mar. 2017).
  • [4] Bińkowski, M., Sutherland, D. J., Arbel, M., and Gretton, A. Demystifying MMD GANs. In International Conference on Learning Representations (2018).
  • [5] Borji, A. Pros and cons of gan evaluation measures. Computer Vision and Image Understanding 179 (2018), 41–65.
  • [6] Chen, X., Duan, Y., Houthooft, R., Schulman, J., Sutskever, I., and Abbeel, P. Infogan: Interpretable representation learning by information maximizing generative adversarial nets. In Advances in Neural Information Processing Systems 29, D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, Eds. Curran Associates, Inc., 2016, pp. 2172–2180.
  • [7] Daskalakis, C., Ilyas, A., Syrgkanis, V., and Zeng, H. Training GANs with optimism. In International Conference on Learning Representations (2018).
  • [8] Donahue, J., and Simonyan, K. Large scale adversarial representation learning. arXiv preprint arXiv:1907.02544 (2019).
  • [9] Dong, H.-W., Hsiao, W.-Y., Yang, L.-C., and Yang, Y.-H. Musegan: Multi-track sequential generative adversarial networks for symbolic music generation and accompaniment. In Thirty-Second AAAI Conference on Artificial Intelligence (2018).
  • [10] Frid-Adar, M., Diamant, I., Klang, E., Amitai, M., Goldberger, J., and Greenspan, H. Gan-based synthetic medical image augmentation for increased cnn performance in liver lesion classification. Neurocomputing 321 (2018), 321–331.
  • [11] F.R.S., K. P. Liii. on lines and planes of closest fit to systems of points in space. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science 2, 11 (1901), 559–572.
  • [12] Goodfellow, I. J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., and Bengio, Y. Generative Adversarial Networks. ArXiv e-prints (June 2014).
  • [13] Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V., and Courville, A. C. Improved training of wasserstein gans. CoRR abs/1704.00028 (2017).
  • [14] Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., and Hochreiter, S. Gans trained by a two time-scale update rule converge to a local nash equilibrium. In Advances in Neural Information Processing Systems 30, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, Eds. Curran Associates, Inc., 2017, pp. 6626–6637.
  • [15] Hinton, G. Deep Belief Nets. Springer US, Boston, MA, 2010, pp. 267–269.
  • [16] Hinton, G. E., Osindero, S., and Teh, Y.-W. A fast learning algorithm for deep belief nets. Neural computation 18, 7 (2006), 1527–1554.
  • [17] Johnson, W. B., and Lindenstrauss, J. Extensions of lipschitz mappings into a hilbert space. Contemporary mathematics 26, 189-206 (1984), 1.
  • [18] Karras, T., Aila, T., Laine, S., and Lehtinen, J. Progressive growing of gans for improved quality, stability, and variation. arXiv preprint arXiv:1710.10196 (2017).
  • [19] Kim, T., Cha, M., Kim, H., Lee, J. K., and Kim, J. Learning to discover cross-domain relations with generative adversarial networks. In Proceedings of the 34th International Conference on Machine Learning (International Convention Centre, Sydney, Australia, 06–11 Aug 2017), D. Precup and Y. W. Teh, Eds., vol. 70 of Proceedings of Machine Learning Research, PMLR, pp. 1857–1865.
  • [20] Kingma, D. P., and Welling, M. Stochastic gradient vb and the variational auto-encoder. In Second International Conference on Learning Representations, ICLR (2014).
  • [21] Kodali, N., Abernethy, J. D., Hays, J., and Kira, Z. How to train your DRAGAN. CoRR abs/1705.07215 (2017).
  • [22] Lecun, Y., Bottou, L., Bengio, Y., and Haffner, P. Gradient-based learning applied to document recognition. Proceedings of the IEEE 86, 11 (Nov 1998), 2278–2324.
  • [23] LeCun, Y., Chopra, S., Hadsell, R., Ranzato, M., and Huang, F. A tutorial on energy-based learning. Predicting structured data 1, 0 (2006).
  • [24] Ledig, C., Theis, L., Huszár, F., Caballero, J., Cunningham, A., Acosta, A., Aitken, A., Tejani, A., Totz, J., Wang, Z., et al. Photo-realistic single image super-resolution using a generative adversarial network. In Proceedings of the IEEE conference on computer vision and pattern recognition (2017), pp. 4681–4690.
  • [25] Li, J., Liang, X., Wei, Y., Xu, T., Feng, J., and Yan, S. Perceptual generative adversarial networks for small object detection. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (July 2017), pp. 1951–1959.
  • [26] Liu, Y., Qin, Z., Luo, Z., and Wang, H. Auto-painter: Cartoon image generation from sketch by using conditional generative adversarial networks. arXiv preprint arXiv:1705.01908 (2017).
  • [27] Liu, Z., Luo, P., Wang, X., and Tang, X. Deep learning face attributes in the wild. In Proceedings of International Conference on Computer Vision (ICCV) (December 2015).
  • [28] Manisha, P., and Gujar, S. Generative adversarial networks (gans): The progress so far in image generation. arXiv preprint arXiv:1804.00140 (2018).
  • [29] Mao, X., Li, Q., Xie, H., Lau, R. Y. K., and Wang, Z. Multi-class generative adversarial networks with the L2 loss function. CoRR abs/1611.04076 (2016).
  • [30] Mescheder, L. M., Nowozin, S., and Geiger, A. The numerics of gans. CoRR abs/1705.10461 (2017).
  • [31] Nguyen, T. D., Le, T., Vu, H., and Phung, D. Dual discriminator generative adversarial nets. In Proceedings of the 31st International Conference on Neural Information Processing Systems (USA, 2017), NIPS’17, Curran Associates Inc., pp. 2667–2677.
  • [32] Qi, G. Loss-sensitive generative adversarial networks on lipschitz densities. CoRR abs/1701.06264 (2017).
  • [33] Radford, A., Metz, L., and Chintala, S. Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434 (2015).
  • [34] Salimans, T., Goodfellow, I. J., Zaremba, W., Cheung, V., Radford, A., and Chen, X. Improved techniques for training gans. CoRR abs/1606.03498 (2016).
  • [35] Subramanian, S., Rajeswar, S., Dutil, F., Pal, C. J., and Courville, A. C. Adversarial generation of natural language. In Rep4NLP@ACL (2017).
  • [36] Wu, H., Zheng, S., Zhang, J., and Huang, K. Gp-gan: Towards realistic high-resolution image blending.
  • [37] Wu, J., Zhang, C., Xue, T., Freeman, B., and Tenenbaum, J. Learning a probabilistic latent space of object shapes via 3d generative-adversarial modeling. In Advances in neural information processing systems (2016), pp. 82–90.
  • [38] Yoo, D., Kim, N., Park, S., Paek, A. S., and Kweon, I. S. Pixel-level domain transfer. In Computer Vision – ECCV 2016 (Cham, 2016), B. Leibe, J. Matas, N. Sebe, and M. Welling, Eds., Springer International Publishing, pp. 517–532.
  • [39] Yu, L., Zhang, W., Wang, J., and Yu, Y. Seqgan: Sequence generative adversarial nets with policy gradient. CoRR abs/1609.05473 (2016).
  • [40] Zhao, J. J., Mathieu, M., and LeCun, Y. Energy-based generative adversarial network. CoRR abs/1609.03126 (2016).
  • [41] Zhu, J.-Y., Park, T., Isola, P., and Efros, A. A. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Computer Vision (ICCV), 2017 IEEE International Conference on (2017).