Skip to content

Project work done for course CS536:Pattern Recognition and Machine Learning at Rutgers University

Notifications You must be signed in to change notification settings

janish-parikh/Image_to_Image_Translation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Image_to_Image_Translation

Project work done for course CS536:Pattern Recognition and Machine Learning at Rutgers University

Vanilla Generative Adversarial Networks (GAN):

Generative Adversarial Networks (GANs) are deep-learning based generative models. They are a generative model with implicit density estimation, part of unsupervised learning and are using two neural networks. Thus, we understand the terms “generative” and “networks” in “generative adversarial networks”.They can be used for a wide variety of purposes like style transfer, photo blending, image-to-image translation, etc. In this task, we aim to use GAN training real pizza and synthetic pizza datasets so that we can generate our own synthetic pizza.

Conclusions from this step: In this project step we tackled the problem of Generating fake images using Generative Adversarial Networks. We trained two GANs, one for the real pizza images and one for the synthetic pizza images. We observed that the images generated by the real pizza GAN were very visually appealing and looked very real. We can achieve better performance by using a smaller batch size and training more epochs. Further, we were able to generate good results with 85 epochs of training and if we were to train 100-150 epochs and augment more data either via blurring , rotating images ,adding jitters, we can achieve better results. Another future scope of work would be to fine tune the architecture to get better results.

Paired Image-to-Image Translation using Pix2Pix Architecture:

Image-to-Image translation has applications in areas such as converting black and white photos and videos to colour, style transfer, season transfer, etc. In paired image-to-image translation, which is a supervised approach, each image in the source domain is mapped to the desired image in the target domain. The model is trained to learn this mapping. The architecture used for this technique is called the Pix2Pix architecture, which is a Conditional GAN architecture. In traditional GAN and DCGAN, we cannot control the class of the image generated by the generator. Conditional GANs overcome this drawback by conditioning the generator and discriminator on specific class labels. The Pix2Pix architecture is an extension of Conditional GANs in which instead of feeding a random noise vector as input to the generator, the image from the source domain is given as input. The output of the generator is the translated image, i.e. the desired image from the target domain. The discriminator, which is a conditional discriminator, is fed a pair of images as input. One image in this pair is the input image, and the other is either the real output image (i.e. the one from the dataset) or the fake output image (the one generated by the generator). The discriminator learns to classify whether the output image is real or fake. We performed paired image-to-image translation on the Dayton dataset, which is a dataset of street views and overhead views of roads in the US. We also perform a quantitative evaluation of this model using the Frechet Inception Distance (FID) and Inception Score (IS) evaluation metrics.

Conclusions from this step: In this project step we tried to tackle the paired Image-to-image translation problem which is challenging in it's nature as it often requires specialized models and loss functions for a given translation task or dataset at hand.To solve this problem we explored the Pix2Pix GAN that models the loss function using a combination of L1 Distance and Adversarial Loss with additional novelties in the design of the Generator and Discriminator that allows us to generate images that are both plausible in the content of the target domain, and is also a plausible translation of the input image. We were able to reproduce the results of the Pix2Pix architecture. The results obtained were in accordance of our evaluation metrics, FID and IS. The images that had a lower FID and high IS were also the ones that performed good in qualitative evaluation. The FID for both the transformations Street to Aerial and Aerial to Street decreased over increasing epochs and the Inception Score increased on increasing epochs. The future scope of work would be to train both the unpaired Pix2Pix networks for a batch size of 1 and observe the results. Also to optimize the runtime one can explore learning schedules.

Unpaired Image-to-Image Translation using CycleGAN Architecture:

In the unsupervised domain, Cycle-Generative Adversarial Networks (CycleGAN) is prominent and has achieved impressive results in many applications using a concept of Cycle-Consistency. Cycle-Consistency means that if an image is translated from the source domain to the target domain, and then the translated image in the target domain is translated back to the source domain, then the original source image should be obtained. In this step we implement the CycleGAN to translate images from Live Pizza Domain and Synthetic or Pre-Recorded Domain to Real Pizza Domain and vice-versa.

Enhanced CycleGAN Architecture:

Although the CycleGAN framework works well, it is constrained to the shape and the level of the pixels and hence cannot remove large objects or remove irrelevant texture leading to Unrealistic Artifacts. To avoid this drawback, we propose a new loss function for CycleGAN, in which the Cycle-Consistency loss is now a linear combination of the VGG Perceptual loss or feature level loss and the pixel level consistency loss. The performance of this new framework is evaluated qualitatively on the basis of the generated images, and quantitatively using the Frechet Inception Distance (FID) and Inception Score (IS).

About

Project work done for course CS536:Pattern Recognition and Machine Learning at Rutgers University

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published