Here we briefly present our work Multi-Agent Diverse Generative Adversarial Networks which we call as MAD-GAN. This work got accepted in CVPR-2018 as spotlight. In this work we focus on addressing the mode collapse problem in GANs.
We started analyzing GANs on a simple scenario of 1D mixture of gaussians with overlapping modes. Given enough training data, ideally a GAN should be able to generate similar distribution as the training data but as we know, GANs suffer from mode collapse because the true equilibrium is hard to reach. It couldn’t capture even this simple scenario. However, our model MAD-GAN was able to capture all the modes. Briefly, our solution MAD-GAN mainly focuses on increasing the modeling capacity of GANs by increasing the number of generators.
Our formulation works even on the challenging task with real world application of image-to-image translation. Please note that on this task, conditional gans are known to learn delta distribution. Still, MAD-GAN is able to generate diverse & realistic images.
Our main idea is built on the intuition that if a dataset has images of artworks by three different real artists, and a single painter, as in the case of a vanilla GAN with single generator, is given the task of painting all the different kinds, then he would be overwhelmed and not be able to do any of the styles perfectly. To produce better results, we design a model with multiple generators, where the discriminator pushes each generator to master an individual style. We modify the objective function such that the discriminator’s task now is to identify correctly which generator produced a particular generation. It can do this best if it distributes the task of modeling the artworks to the different generators. Now each of the generators has a reduced task of producing images of a particular kind. For example, once the discriminator sees a fake Mona Lisa, it has to tell exactly which fake artist is responsible for making art similar to Da Vinci’s style.
When we trained our model on a mixed dataset consisting of forests, ice bergs and indoor scenes, we found that the three natural clusterings emerged. Each generator learnt to generate images from a different dataset, showing its ability to implicitly disentangle not only inter-class variations but also intra-class variations. This empirical evidence supports our theoretical claim that MAD-GAN learns a mixture model where each generator focuses on one mixture component.
When our model is trained on a very diverse sketch-to-bags dataset, then given a single sketch, we could generate diverse images of bags. The generators produced bags of visually different colors and textures. Similarly, for the night-to-day task, when given a night image, MAD-GAN could produce three plausible day light images. Each generator consistently produced visually different realistic daylight image.
As explained earlier, even though the mixture of gaussians seems a much simpler dataset to learn, it turns out to be quite challenging for GANs. Only MAD-GAN was able to capture all the modes, with only InfoGAN as the other contender. Moreover, to create MAD-GAN from DCGAN, one only needs to update very few lines of code.
Copyright 2015 Viveka Kulharia. Last update (ET): 2020-11-03 11:24:14 +0000. Atom Feed