Gentle Introduction to Cycle GANs
1) Introduction to GAN concept :
GANs are basically AI models which consist of two separate neural networks which compete against each other to give a more accurate prediction.They run unsupervised and use a zero-sum game framework to learn.
The two neural networks described above are called discriminators and generators.The objective of the generator is to produce outputs that are similar to the real data provide. On the other hand, the goal of the discriminator is to identify between the original and produced data. The accuracy of model increases when the discriminator can no longer differentiate between the real and generated data, which leads to a dip in the loss of generator.
2) Need for GANs :
Most of the existing neural nets can misclassify things on addition of only a small amount of noise into the original data. Ironically, the model after adding noise has higher confidence in the wrong prediction than when it predicted correctly. The reason for such calamity is that most machine learning models learn from a limited amount of data, which is a huge drawback, as it is prone to overfitting. Also, the mapping between the input and the output is almost linear. In reality, the boundaries of separation are composed of linearities and even a small change in a point in the feature space might lead to misclassification of data.
GANs are popularily used to perform filling in images from an outline, generate images from text, produce alike images from the sample data provided and much more…
3) Cycle GAN Concept :
Cycle GANs are a subset of the contemporary GANs, which are created specifically for the task of unpaired image-to-image translation. The idea is to capture special characteristics of one image collection, translated into the other image collection, all in the absence of any paired training examples. The models are trained in an unsupervised manner using a collection of images from the source and target domain that do not need to be related in any way.
The need for such model or methodologies arose from the fact that, for image translation, we need paired data in ample amount. Since data scarcity is one of the mazor problems in the field of Artificial Intelligence, the conventional AI approach seemed to be out of question for achieving the goal.
4) Methodology :
The authors of the paper assume there is some underlying relationship between the domains: for example, that they are two different renderings of the same underlying scene- and seek to learn that relationship.
- The objective is to exploit supervision at the level of sets: we are given one set of images in domain X and a different set in domain Y .
- Train a mapping G : X → Y such that the output yˆ = G(x), x ∈ X, is indistinguishable from images y ∈ Y by an adversary trained to classify yˆ apart from y.
- Given translator G : X → Y and another translator F : Y → X, then G and F should be inverses of each other, and both mappings should be bijections. This is the actual essence of the concept, because it emphasizes on the cyclic nature of the architecture.
- Apply this structural assumption by training both the mapping G and F simultaneously, and adding a cycle consistency loss that encourages F(G(x)) ≈ x and G(F(y)) ≈ y. Combining this loss with adversarial losses on domains X and Y yields a full objective for unpaired image-to-image translation.
- Two mapping functions G : X → Y and F : Y → X, and associated adversarial discriminators DY and DX.
- DY encourages G to translate X into outputs indistinguishable from domain Y , and vice versa for DX and F
- Two cycle consistency losses capture the intuition that if we translate from one domain to the other and back again we should arrive at where we started:
- Forward cycle-consistency loss: x → G(x) → F(G(x)) ≈ x, and
- Backward cycle-consistency loss: y → F(y) → G(F(y)) ≈ y
- Primary focus is learning the mapping between two image collections, rather than between two specific images, by trying to capture correspondences between higher-level appearance structures. Therefore, applicable to other tasks, such as painting→photo, object transfiguration, etc. where single sample transfer methods do not perform well.
- Adversarial Loss : For the mapping function G : X → Y and its discriminator DY , we express the objective as -
- G tries to generate images G(x) that look similar to images from domain Y , while DY aims to distinguish between translated samples G(x) and real samples y.
- G aims to minimize this objective against an adversary D that tries to maximize it, i.e.,
- Cycle Consistency Loss :
- With large enough capacity, a network can map the same set of input images to any random permutation of images in the target domain, where any of the learned mappings can induce an output distribution that matches the target distribution.
- Learned mapping functions should be cycle-consistent — for each image x from domain X, the image translation cycle should be able to bring x back to the original image, i.e., x → G(x) → F(G(x)) ≈ x.
This is called forward cycle consistency , thereby giving a similar concept of backward cycle consistency y → F(y) → G(F(y)) ≈ y.
- Full Objective :
- Complete loss is given by :
λ controls the relative importance of the two objectives.
- Aim is to solve:
5) Training Details :
The following details are with respect to the model repo provided by the authors of the paperof Cycle GAN. We will also discuss about using the repo for a sample image collection.
- Two techniques used to stabilize the model training procedure.
- For the loss function replace the negative log likelihood objective by a least-squares loss. This loss is more stable during training and generates higher quality results.
- For a GAN loss variable (the lambda variable) , we train the G to minimize
and train the D to minimize summation of below values :
- To reduce model oscillation, update the discriminator using a history of generated images rather than the ones produced by the latest generators. An image buffer stores the 50 previously created images.
- Other Details :
- For all the experiments, λ = 10 in
- Use of the Adam solver with a batch size of 1.
- All networks were trained from scratch with a learning rate of 0.0002.
- Same learning rate for the first 100 epochs and linearly decay the rate to zero over the next 100 epochs
- Performance Detail :
- On translation tasks that involve color and texture changes,the method often succeeds.
- Tasks that require geometric changes, have little success. For example, on the task of dog→cat transfiguration, the learned translation degenerates into making minimal changes to the input.
- Some failure cases are caused by the distribution characteristics of the training datasets.
6) Training demonstration for sample image collection :
Repo Details :
- Parameters referred from the ‘Options’ folder of the Repo. Base and Train option scripts can be referred for training, while Testing script can be used with Base option script for testing.
- The loss plot of the training procedure is to be done via ‘visdom’ server, which is to be initiated before starting the training procedure. However, there are two things :
- The server can be initialized only before starting the training procedure. One the training is done, the server can not be used again.
- As per the git issues, the training procedure randomly stops in between the operation.So the nohup operation essentially became a zombie process, and had to be killed while resuming the training procedure (the code repo provides an option to continue the training procedure after it converges,or if training is to be extended).
This is countered by setting the --display_ids parsing argument to 0, which corresponds to setting output of the visdom server.
- The training loss and other values are plotted using a script from the repo, as the authors do not provide an option to re-plot the values. Alternate script written to change the checkpoint values into dataframe and check the optimum ones.
- Changing dimensions : The default --load_size value is 286 and the
--crop_size value is 256. These values can be changed according to the requirement, in combination with some other parsing arguments like
--preprocess , --scale_width etc (refer tips.md, options folder or issue section in repo for more details)
** A little heads up : nohup (No Hang Up) is a bash command to execute any task in background,that runs the process even after logging out from the shell/terminal.
Repo Demo for a sample folder :
- General arguments- (i)
- Using --continue_train as argument while training: Should
give --epoch_count as supporting argument with this argument. (ii)
- The nohup command here includes a few other arguments like load_size, crop_size, save_epoch_freq etc, which can be tweaked according to the need. (iii)
- The testing can be done on Colab as it does not require much GPU for consumption.
- If testing is performed in only one direction, then dataroot should be given the location of the test file, and the --model argument should be given ‘test’ as keyword. Otherwise the keyword should be ‘cycle_gan’ for the argument.
- The result directory can be changed according to the requirement as well. All of this can be referred from the options folder.
Thanks for reading this blog. See you around over some other interesting topics. Till then, grow more, and stay safe : )