pix2pix
pix2pix is a framework for image-to-image translation based on conditional generative adversarial networks (cGANs). It was introduced by Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei Efros in the 2016 paper “Image-to-Image Translation with Conditional Adversarial Networks.” The model learns a mapping G that, conditioned on an input image x, produces an output image y. It requires paired training data (x, y) and jointly trains a generator and a discriminator. The generator is commonly implemented as a U-Net with skip connections, while the discriminator is a PatchGAN that classifies overlapping image patches as real or fake. The loss combines an adversarial term with a reconstruction term, typically an L1 distance between the generated image and the ground truth.
During training, the objective is to optimize a combination of the adversarial loss and a reconstruction loss.
Applications of pix2pix include tasks such as edges to photographs, maps to aerial photos, black-and-white to
Limitations include reliance on accurately paired data and alignment between input and output images; performance can