Conditional GANs

5 min readAug 31, 2021

Conditional GANs

Background on Conditional GANs

Conditional GANs were an early improvement made on the original GANs architecture. They posited that training the generator and discriminator on external, structured data, could improve the accuracy and ease the training of a GAN. Since this modification, conditional GANs have become extremely common in high quality image generation [2,3,4]. The basic architecture is to not only feed the noise (or sample) into the generator (or discriminator respectively),but to also feed both networks a conditional vector , y, containing conditional arguments.

This conditional argument can be any external structured data, such as image labels or metadata. This hopefully imposes some structure on the latent space of the generator as well as improving the accuracy of the generator. This architecture is shown in Figure 1 from the original paper [1].

They originally tested this on architectures leveraging MLP networks using the original GANs loss functions. They found that this improved the performance of generated samples by computing the Parzen window-based log-likelihood estimates with respect to the test data and validation data. They found that under this metric, the conditional GAN framework performs better than the original GAN. The conditioning information that they used on the MNIST data were the numbers drawn encoded as one-hot vectors. They conjecture that conditioning helps encode semantic information into the generator with some structure.Therefore, they also proposed that this theory can also be extended to leverage image semantic information to conditionally generate other images.

They also proposed that since GANs are naturally suited to approximating one-to-many functions, training on poorly labelled data sets may be possible, such as those with labels generated by anonymous individuals. Furthermore, they also proposed jointly training a natural language processor such as a skip-gram model with a conditional GAN to learn semantic word representations and use the learned semantic representations to condition the GAN.

Code Implementation

Below is as conditional GAN trained on the MNIST data set to learn how to write the digits 0–9. In this example, I use a DCGAN generator and discriminator [5] and train using MSELoss in the style of LSGANs. I will go into DCGANs in greater depth in my next post, so for now I want to focus on the aspects which are unique to conditional GANs.

In a conditional GAN, we need to feed the generator and discriminator a conditional argument, y, in addition to the normal arguments which they are passed. In this case, we will be passing both networks the class labels as an integer 0–9. To make use of this information in the generator, we need to embed this information into the latent space and create an additional feature map to train the network on. We do this using the built-in PyTorch functional, Embedding. It will learn the correct embedding of our class labels into the latent space, Z, as well as perform the mapping of our information into the latent space. Finally, we map both the prior latent variable, z, and the embedding of the conditional, y, into 4x4 feature maps using dense linear layers. The output of the latent code and the conditional code are then concatenated in their channel dimension and fed into the generator. This is carried out in the following code contained in the DCGAN_Generator class.

We need to perform the same embedding and dense mapping on y to allow us to concatenate it with the generator output (and real data). This is done in the similar code here.

Finally, the last addition that we must make to train the conditional GAN is to pass the generator and discriminator (data,label) pairs when training. When we generate random samples with the generator, we also want to create a set of random integer labels with uniform distribution to feed to the generator and discriminator as true labels for this pair.

Results:

When we train this network, we observe that the generator learns to produce outputs which correspond to the given semantic labels. In this case, we train it on the labels 0–9, and after training, we can feed the generator a random latent code, z, and the conditional label, y. When we do this, we see that the generator produces an output that has the semantic characteristics of the conditional input.

Below are sample training outputs at various epochs of training as well as samples generated after training the network.

Resources:

[1] Mehdi Mirza, Simon Osindero: “Conditional Generative Adversarial Nets”, 2014; [http://arxiv.org/abs/1411.1784 arXiv:1411.1784].

[2] Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Andrew Tao, Jan Kautz, Bryan Catanzaro: “High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs”, 2018; [https://arxiv.org/abs/1711.11585 arXiv:1722.11585].

[3] Tomas Jakab, Ankush Gupta, Hakan Bilen, Andrea Vedaldi: “Unsupervised Learning of Object Landmarks through Conditional Image Generation”, 2018; [https://arxiv.org/abs/1806.07823 arXiv: 1806.07823].

[4] Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, Alexei A. Efros: “Image-to-Image Translation with Conditional Adversarial Networks”, 2018; [https://arxiv.org/abs/1611.07004 1611.07004].

[5] Alec Radford, Luke Metz, Soumith Chintala: “Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks”, 2015; [http://arxiv.org/abs/1511.06434 arXiv:1511.06434].

- Brian Loos

Written by xNeurals

No responses yet