DCGANs

5 min readSep 2, 2021

Background on DCGANs

Prior to the advent of GANs convolutional neural networks were one of the most widely researched areas in deep learning. Their widespread success in image classification, segmentation, and localization tasks inspired a wealth of research. I went into more depth in a previous blog post, but to reiterate what I said shortly, convolutional neural networks paired with large amounts of data have given us a way to create machines with near human perceptive capabilities. The way they managed to do this was through a purely differentiable structure which is able to capture details and localize information. This structure allows end to end training on large data sets and understanding of very high level semantic understanding as is seen in [1,2]. For these reasons, convnets seem to be the natural choice as building blocks for a GAN whose goal is to generate images.

However, it wasn’t until Radford and Mets published DCGANs[3], which was the first successful architecture for training convnets in an adversarial training framework. They found that there are particular restrictions one must place on the convnets in order for the training regime to be stable. They called the family of convnets which could be easily trained in a GAN architecture deep convolutional generative adversarial network, or DCGANs. They defined the DCGAN architecture with the following five guidelines for designing a generator and discriminator for use in a GAN. These guidelines are:

Replace pooling and upsampling layers with strided convolutions and strided transposed convolutions
Use 2d batch normalization between all convolutional layers
Use rectified linear units in the generator and leaky rectified linear units in the discriminator
Use Tanh activation in the output layer of the generator and a sigmoid activation in the output layer of the discriminator.

*these guidelines are directly from their original paper [3]

These guidelines were informed by state of the art methods which have become commonplace in convnet training for accelerating and stabilizing training. These methods are explored in more depth here: batch normalization [4], strided convolutions [1], relu activations [5], and leaky relu activations [8]. These techniques are key to allowing a convnet to be easily trained in a GANs framework where generator training is widely known to be unstable and difficult.

Moreover, the most interesting result presented in the original DCGANs paper is that the generator’s latent space appears to have a distinguishable semantic structure similar to the semantic latent space structure of the skip-gram model as discussed in [9]. It is well understood that the generator network maps a prior latent distribution into a distribution which is near the data distribution, but it is not intuitively clear that it would do so in a way which preserves some semantic structure. INFOGANs, which was published shortly afterwards, shows that a specific semantic structure can be imposed on the latent distribution through minimizing a latent distribution [6]. Furthermore, VEEGANs attempt to assign a loss to the latent space in a way to force the prior distribution towards an approximation of the data distribution mapped into the latent space by an encoder [7]. It seems as if the generator’s latent space organizes itself in a way which somehow captures a semantic understanding of representations. We can see that when DCGANs are combined with another architecture which aims to structure the latent space, we can achieve quite powerful results, such as my previous post on Conditional GANs.

Implementation:

As with my implementation of ResNet and VGG-19, I am deferring my implementation of DCGANs to my github repository here. This is a very standard DCGAN implementation. I am certainly not the first or last person to create a repository with this code. A very detailed implementation of a DCGAN in pytorch is given in the torchvision documentation here. It would be hard to cover a DCGAN implementation in the same level of detail as they did without copying their work entirely, so I would prefer to leave them as a reference walkthrough of a DCGAN implementation.

In my implementation of DCGANs, I did run into a single line of code that cost me several hours of debugging to discover. Moreover, it seems to be an issue that was difficult to reproduce with my training loop. The line that is highlighted in “ “ (yellow) was originally placed inside my training loop and created every time a new sample image was drawn. For a reason which I have not yet figured out, when this line of code was placed inside the training loops (where the yellow arrow is), the generated samples were extremely poor and all identical. However, upon moving this outside the training loop, the GAN learning can be observed and everything works as expected. I cannot reason what causes this problem, but moving this single line of code resolved it. I suspect it may be an issue with my version of pytorch and the cuda toolkit.

Finally, I trained my DCGAN on the MNIST data set, and we can observe the quality of the samples produced by the generator throughout the training cycle of the GAN and we can visualize the entire training cycle in the gif included in the github.

Resources:

[1] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net, 2015.

[2] Matthew D Zeiler and Rob Fergus. Visualizing and understanding convolutional networks, 2013.

[3] Alec Radford, Luke Metz, and Soumith Chintala. Unsupervised representation learning with deep convolutional generative adversarial networks, 2016.

[4] Sergey Ioffe and Christian Szegedy. Batch normalization: Accelerating deep network training by reduc-ing internal covariate shift, 2015.

[5] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural networks. In F. Pereira, C. J. C. Burges, L. Bottou, and K. Q. Weinberger, editors,Advances inNeural Information Processing Systems, volume 25. Curran Associates, Inc., 2012.

[6] Xi Chen, Yan Duan, Rein Houthooft, John Schulman, Ilya Sutskever, and Pieter Abbeel. Infogan:Interpretable representation learning by information maximizing generative adversarial nets, 2016.

[7] Akash Srivastava, Lazar Valkov, Chris Russell, Michael U. Gutmann, and Charles Sutton. Veegan:Reducing mode collapse in gans using implicit variational learning, 2017.

[8] Bing Xu, Naiyan Wang, Tianqi Chen, and Mu Li. Empirical evaluation of rectified activations in convolutional network, 2015.

[9] Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeffrey Dean. Distributed representations of words and phrases and their compositionality, 2013.

— Brian Loos

DCGANs

Implementation:

Resources:

Written by xNeurals