Initial

affinelayer · Jan 25, 2017 · 27c45eb · 27c45eb
1 parent 0e3c3ee
commit 27c45eb
Show file tree

Hide file tree

Showing 34 changed files with 1,179 additions and 1 deletion.
diff --git a/README.md b/README.md
@@ -1,2 +1,182 @@
 # pix2pix-tensorflow
-Tensorflow Port of Image-to-image translation using conditional adversarial nets https://phillipi.github.io/pix2pix/
+
+Based on [pix2pix](https://phillipi.github.io/pix2pix/) by Isola et al.
+
+[Article about this implemention](https://affinelayer.com/pix2pix/)
+
+Tensorflow implementation of pix2pix.  Learns a mapping from input images to output images, like these examples from the original paper:
+
+<img src="docs/examples.jpg" width="900px"/>
+
+This port is based directly on the torch implementation, and not on an existing Tensorflow implementation.  It is meant to be a faithful implementation of the original work and so does not add anything.  The processing speed on a GPU with cuDNN was equivalent to the Torch implementation in testing.
+
+## Setup
+
+### Prerequisites
+- Tensorflow 0.12.1
+
+### Recommended
+- Linux with Tensorflow GPU edition + cuDNN
+
+### Getting Started
+
+```sh
+# Clone this repo
+git clone https://github.com/affinelayer/pix2pix-tensorflow.git
+cd pix2pix-tensorflow
+# Download the CMP Facades dataset http://cmp.felk.cvut.cz/~tylecr1/facade/
+python tools/download-dataset.py facades
+# Train the model (this may take 1-8 hours depending on GPU, on CPU you will be waiting for a bit)
+python pix2pix.py --mode train --output_dir facades_train --max_epochs 200 --input_dir facades/train --which_direction BtoA
+# Test the model
+python pix2pix.py --mode test --output_dir facades_test --input_dir facades/val --checkpoint facades_train
+```
+
+The test run will output an HTML file at `facades_test/index.html` that shows input/output/target image sets.
+
+## Datasets
+
+The data format used by this program is the same as the original pix2pix format, which consists of images of input and desired output side by side like:
+
+<img src="docs/ab.png" width="256px"/>
+
+For example:
+
+<img src="docs/418.png" width="256px"/>
+
+Some datasets have been made available by the authors of the pix2pix paper.  To download those datasets, use the included script `tools/download-dataset.py`.
+
+| dataset | image |
+| --- | --- |
+| `python tools/download-dataset.py facades` <br> 400 images from [CMP Facades dataset](http://cmp.felk.cvut.cz/~tylecr1/facade/). (31MB)  | <img src="docs/facades.jpg" width="256px"/> |
+| `python tools/download-dataset.py cityscapes` <br> 2975 images from the [Cityscapes training set](https://www.cityscapes-dataset.com/). (113M) | <img src="docs/cityscapes.jpg" width="256px"/> |
+| `python tools/download-dataset.py maps` <br> 1096 training images scraped from Google Maps (246M) | <img src="docs/maps.jpg" width="256px"/> |
+| `python tools/download-dataset.py edges2shoes` <br> 50k training images from [UT Zappos50K dataset](http://vision.cs.utexas.edu/projects/finegrained/utzap50k/). Edges are computed by [HED](https://github.com/s9xie/hed) edge detector + post-processing. (2.2GB) | <img src="docs/edges2shoes.jpg" width="256px"/>  |
+| `python tools/download-dataset.py edges2handbags` <br> 137K Amazon Handbag images from [iGAN project](https://github.com/junyanz/iGAN). Edges are computed by [HED](https://github.com/s9xie/hed) edge detector + post-processing. (8.6GB) | <img src="docs/edges2handbags.jpg" width="256px"/> |
+
+The `facades` dataset is the smallest and easiest to get started with.
+
+### Creating your own dataset
+
+#### Example: creating images with blank centers for [inpainting](https://people.eecs.berkeley.edu/~pathak/context_encoder/)
+
+<img src="docs/combine.png" width="900px"/>
+
+```sh
+# Resize source images
+python tools/process.py --input_dir photos/original --operation resize --output_dir photos/resized
+# Create images with blank centers
+python tools/process.py --input_dir photos/resized --operation blank --output_dir photos/blank
+# Combine resized images with blanked images
+python tools/process.py --input_dir photos/resized --b_dir photos/blank --operation combine --output_dir photos/combined
+# Split into train/val set
+python tools/split.py --dir photos/combined
+```
+
+The folder `photos/combined` will now have `train` and `val` subfolders that you can use for training and testing.
+
+#### Creating image pairs from existing images
+
+If you have two directories `a` and `b`, with corresponding images (same name, same dimensions, different data) you can combine them with `process.py`:
+
+```sh
+python tools/process.py --input_dir a --b_dir b --operation combine --output_dir c
+```
+
+This puts the images in a side-by-side combined image that `pix2pix.py` expects.
+
+#### Colorization
+
+For colorization, your images should ideally all be the same aspect ratio.  You can resize and crop them with the resize command:
+```sh
+python tools/process.py --input_dir photos/original --operation resize --output_dir photos/resized
+```
+
+No other processing is required, the colorzation mode (see Training section below) uses single images instead of image pairs.
+
+## Training
+
+### Image Pairs
+
+For normal training with image pairs, you need to specify which directory contains the training images, and which direction to train on.  The direction options are `AtoB` or `BtoA`
+```sh
+python pix2pix.py --mode train --output_dir facades_train --max_epochs 200 --input_dir facades/train --which_direction BtoA
+```
+
+### Colorization
+
+`pix2pix.py` includes special code to handle colorization with single images instead of pairs, using that looks like this:
+
+```sh
+python pix2pix.py --mode train --output_dir photos_train --max_epochs 200 --input_dir photos/train --lab_colorization
+```
+
+In this mode, image A is the black and white image (lightness only), and image B contains the color channels of that image (no lightness information).
+
+### Tips
+
+You can look at the loss and computation graph using tensorboard:
+```sh
+tensorboard --logdir=facades_train
+```
+
+<img src="docs/tensorboard-scalar.png" width="250px"/> <img src="docs/tensorboard-image.png" width="250px"/> <img src="docs/tensorboard-graph.png" width="250px"/>
+
+If you wish to write in-progress pictures as the network is training, use `--display_freq 50`.  This will update `facades_train/index.html` every 50 steps with the current training inputs and outputs.
+
+## Testing
+
+Testing is done with `--mode test`.  You should specify the checkpoint to use with `--checkpoint`, this should point to the `output_dir` that you created previously with `--mode train`:
+
+```sh
+python pix2pix.py --mode test --output_dir facades_test --input_dir facades/val --checkpoint facades_train
+```
+
+The testing mode will load some of the configuration options from the checkpoint provided so you do not need to specify `which_direction` for instance.
+
+The test run will output an HTML file at `facades_test/index.html` that shows input/output/target image sets:
+
+<img src="docs/test-html.png" width="300px"/>
+
+## Implementation Validation
+
+Validation of the code was performed on a Linux machine with a ~1.3 TFLOPS Nvidia GTX 750 Ti GPU.  Due to a lack of compute power, validation is not extensive and only the `facades` dataset at 200 epochs was tested.
+
+```sh
+git clone https://github.com/affinelayer/pix2pix-tensorflow.git
+cd pix2pix-tensorflow
+python tools/download-dataset.py facades
+time nvidia-docker run --volume $PWD:/prj --workdir /prj --env PYTHONUNBUFFERED=x affinelayer/tensorflow:pix2pix python pix2pix.py --mode train --output_dir facades_train --max_epochs 200 --input_dir facades/train --which_direction BtoA
+nvidia-docker run --volume $PWD:/prj --workdir /prj --env PYTHONUNBUFFERED=x affinelayer/tensorflow:pix2pix python pix2pix.py --mode test --output_dir facades_test --input_dir facades/val --checkpoint facades_train
+```
+
+Comparison on facades dataset:
+
+| Input | Tensorflow | Torch | Target |
+| --- | --- | --- | --- |
+| <img src="docs/1-inputs.png" width="256px"> | <img src="docs/1-tensorflow.png" width="256px"> | <img src="docs/1-torch.jpg" width="256px"> | <img src="docs/1-targets.png" width="256px"> |
+| <img src="docs/5-inputs.png" width="256px"> | <img src="docs/5-tensorflow.png" width="256px"> | <img src="docs/5-torch.jpg" width="256px"> | <img src="docs/5-targets.png" width="256px"> |
+| <img src="docs/51-inputs.png" width="256px"> | <img src="docs/51-tensorflow.png" width="256px"> | <img src="docs/51-torch.jpg" width="256px"> | <img src="docs/51-targets.png" width="256px"> |
+| <img src="docs/95-inputs.png" width="256px"> | <img src="docs/95-tensorflow.png" width="256px"> | <img src="docs/95-torch.jpg" width="256px"> | <img src="docs/95-targets.png" width="256px"> |
+
+## Unimplemented Features
+
+The following models have not been implemented:
+- defineG_encoder_decoder
+- defineG_unet_128
+- defineD_pixelGAN
+
+## Citation
+If you use this code for your research, please cite the paper this code is based on: <a href="https://arxiv.org/pdf/1611.07004v1.pdf">Image-to-Image Translation Using Conditional Adversarial Networks</a>:
+
+```
+@article{pix2pix2016,
+  title={Image-to-Image Translation with Conditional Adversarial Networks},
+  author={Isola, Phillip and Zhu, Jun-Yan and Zhou, Tinghui and Efros, Alexei A},
+  journal={arxiv},
+  year={2016}
+}
+```
+
+## Acknowledgments
+This is a port of [pix2pix](https://github.com/phillipi/pix2pix) from Torch to Tensorflow.  It also contains colorspace conversion code ported from Torch.
diff --git a/docs/1-inputs.png b/docs/1-inputs.png
diff --git a/docs/1-targets.png b/docs/1-targets.png
diff --git a/docs/1-tensorflow.png b/docs/1-tensorflow.png
diff --git a/docs/1-torch.jpg b/docs/1-torch.jpg
diff --git a/docs/418.png b/docs/418.png
diff --git a/docs/5-inputs.png b/docs/5-inputs.png
diff --git a/docs/5-targets.png b/docs/5-targets.png
diff --git a/docs/5-tensorflow.png b/docs/5-tensorflow.png
diff --git a/docs/5-torch.jpg b/docs/5-torch.jpg
diff --git a/docs/51-inputs.png b/docs/51-inputs.png
diff --git a/docs/51-targets.png b/docs/51-targets.png
diff --git a/docs/51-tensorflow.png b/docs/51-tensorflow.png
diff --git a/docs/51-torch.jpg b/docs/51-torch.jpg
diff --git a/docs/95-inputs.png b/docs/95-inputs.png
diff --git a/docs/95-targets.png b/docs/95-targets.png
diff --git a/docs/95-tensorflow.png b/docs/95-tensorflow.png
diff --git a/docs/95-torch.jpg b/docs/95-torch.jpg
diff --git a/docs/ab.png b/docs/ab.png
diff --git a/docs/cityscapes.jpg b/docs/cityscapes.jpg
diff --git a/docs/combine.png b/docs/combine.png
diff --git a/docs/edges2handbags.jpg b/docs/edges2handbags.jpg
diff --git a/docs/edges2shoes.jpg b/docs/edges2shoes.jpg
diff --git a/docs/examples.jpg b/docs/examples.jpg
diff --git a/docs/facades.jpg b/docs/facades.jpg
diff --git a/docs/maps.jpg b/docs/maps.jpg
diff --git a/docs/tensorboard-graph.png b/docs/tensorboard-graph.png
diff --git a/docs/tensorboard-image.png b/docs/tensorboard-image.png
diff --git a/docs/tensorboard-scalar.png b/docs/tensorboard-scalar.png
diff --git a/docs/test-html.png b/docs/test-html.png