Image-to-Image and Video-to-Video Translation using Pix2Pix and CycleGAN
https://www.youtube.com/shorts/hspKupwXvsM
https://www.youtube.com/shorts/hspKupwXvsM \
Our project explores the power of image-to-image and video-to-video translation using Pix2Pix and CycleGAN. We focused on three main areas of application: satellite-to-map images, facade segmentation, and artistic style transfer on videos.
We re-implemented Pix2Pix and CycleGAN and used non-ML image processing methods to help transform and stitch large images and perform video-to-video translation. For the map section, we input large satellite map images and broke them up into smaller patches with overlap (512x512 pixels with 50 overlapping). After processing these patches through the model, we used a stitching method to combine them and blend the overlap using Gaussian-laplacian pyramids. For videos, two original video clips are stored in the 'video inputs' folders within the data directory. We separated the original video clips into individual frames, and stored the frames as separate images in their corresponding subfolders "city_dark_frames" and "vangogh_frames". Furthermore, we resized each frame to 512x512 before inputting them into the models. We utilized various techniques to improve the video quality, including histogram matching, gaussian-laplacian pyramids blending, and denoising by applying a non-local means filter to remove noise. The resulting images at each stage of the image processing were saved in their respective subfolders - 'hist_output', 'denoise_output', and 'laplacian_output'. The final processed images, after passing through all three methods, were saved in the 'hist_denoise_laplacian_output' folder and can now be assembled into a video.
How to:
Pix2Pix:
CycleGAN:
Processing Large Image Files:
- break_up_panorama(image_path, folder_name, frame_size, overlap_size, x_offset=0, y_offset=0) --> Used to break up large images into smaller patches.
- Use the model of your choice and feed it the "folder_name" directory used above.
- stitch_panorama(image_path, folder_name, reconstructed_path, frame_size, overlap_size) --> Used to stich the processed frames back together. "folder_name" should contain the output from the model used.
Processing Videos:
- video2imageFolder(input_file, output_path) --> Used to break up a video into frames.
- Use the model of your choice and feed it the "output_path" directory used above.
- hist_match_handler(frames_folder_path, output_folder_path) --> Used to apply histogram matching to all frames.
- imageFolder2video(input_path, output_path, video_name, fps) --> Used to put the processed frames back together into a video.
Our results showed that CycleGAN outperformed Pix2Pix in both the map and artistic video sections. For the map section, both models generated maps that closely resembled Google Maps. However, for artistic style transfer, CycleGAN performed better than Pix2Pix, maintaining the structures of objects in the pictures while having less noise. Pix2Pix required additional processing techniques to improve the video quality, while CycleGAN did not. Within the data folder, a captivating video was produced from the frames generated by our models, which can be found in the 'fake_video' subfolder. This video showcases the raw, unprocessed output of the models. However, the true magic lies in the 'final_video' folder, where we meticulously applied proper image processing techniques to the frames, resulting in a stunning final product that represents the pinnacle of our achievement. Witness the remarkable quality we attained after the completion of the entire processing pipeline in this folder. Moreover, the side-by-side comparison with the original unprocessed video is a testament to the success of our image-processing strategies.
Our project highlights the effectiveness of image-to-image and video-to-video translation using Pix2Pix and CycleGAN. By incorporating non-ML image processing methods, we were able to achieve better results with fewer resources. Our findings suggest that CycleGAN is a better choice for artistic style transfer and that Pix2Pix requires additional processing techniques for video output.
Azaan Barlas
Jue Hou
Nikhil Khandekar
Santiago Valencia Sanchez
Project
│ README.md
│ CS445_final_project.jpynb
│
└───Data
│
│
└───video inputs
│
│
└───city_dark_frames
│ │ dark_images
│ │ hist_output
│ │ denoise_output
│ │ laplacian_output
│ │ hist_denoise_output
│ │ hist_denoise_laplacian_output
│ │ fake_video
│ │ final_video
│
│
└───vangogh_frames
│ │ vangogh_output
│ │ hist_output
│ │ denoise_output
│ │ laplacian_output
│ │ hist_denoise_output
│ │ hist_denoise_laplacian_output
│ │ vangogh_synthesized_video
│ │ final_video
│
│
└───map inputs
│ │ original.png
│
│
└───map outputs
│ │ output.png
│ │ final_image.png
│
│
└───video output Pix2Pix
│ best_video.mp4
│
│
└───video output cycleGANs
best_vangogh_video.mp4