This repository is the official implementation of Compressing Images by Encoding Their Latent Representations with Relative Entropy Coding.
The project is structured as a python package, and should be installed as such. Concretely, once the repository has been cloned, navigate to its root directory and using the appropriate python environment, run
> pip install -r requirements.txt
We used the Cifar10, ImageNet32, CLIC 2018 and Kodak datasets for our experiments.
Cifar10 and ImageNet32 are readily available on tensorflow/datasets
and will be automatically downloaded
to your computer once you run the training or evaluation code.
The CLIC 2018 dataset is available at https://data.vision.ee.ethz.ch/cvl/clic/professional_train.zip
The Kodak dataset is available at http://r0k.us/graphics/kodak/
The code was written and tested using Python 3.6. The code uses the Sacred framework for running the experiments, therefore the configuration options for any of the example files presented below can be displayed by running
python <experiment_file.py> print_config
The project files can be found in the rec
directory.
In particular, the implementation of iREC
can be found in rec/coding
,
and the implementation of the VAE models can be found in rec/models
.
The experiments can be found in the examples
directory.
Assuming you wish to save the model to <model/save/path>
and the datasets should be downloaded to
<datasets/base/path>
, to train the RVAE used in the experiments presented in the paper, run
python examples/lossless/train_generative_model.py with model=resnet_vae num_res_blocks=24 \
model_save_base_dir=<model/save/path> dataset_info.dataset_name=<cifar10/imagenet32> \
dataset_info.dataset_base_path=<datasets/base/path> iters=500000 lamb=0.1
For more information on RVAEs, see Kingma et al. (2016).
Using similar assumption about model save paths and dataset paths as above, the two level VAE used in the paper can be trained by running
python examples/lossy/train_lossy_model.py with model=large_level_2_vae \
model_save_base_dir=<model/save/path> dataset.dataset_name=<cifar10/imagenet32> \
dataset.dataset_base_path=<datasets/base/path> loss_fn=mse iters=200000 beta=0.01
Note: the parameter beta
corresponds to lambda = {0.001, 0.003, 0.01, 0.03, 0.05}
in the paper.
For more information about the architecture and its components see
Ballé et al. (2016) and Ballé et al. (2018).
The training in both cases can be monitored using Tensorboard by running
tensorboard --logdir <path/to/model/save/dir>
To compress images using the settings used in the paper, run
python examples/lossless/compression_performance.py with model=resnet_vae model_save_dir=<path/to/model> \
dataset_info.dataset_name=<dataset_name> dataset_info.dataset_base_path=<datasets/base/path> \
dataset_info.split=test mode=compress num_test_images=1 sampler=beam_search n_beams=20 \
extra_samples=1.2 kl_per_partition=3.
Note: n_beams, extra_samples, kl_per_partition
correspond to B, (1 + epsilon), Omega
in the paper, respectively.
To compress images using the settings used in the paper, run
python examples/lossy/compress_with_lossy_model.py with model=large_level_2_vae model_save_dir=<path/to/model> \
dataset.dataset_name=<dataset_name> dataset.dataset_base_path=<datasets/base/path> \
dataset.split=test num_test_images=1 sampler=beam_search n_beams=10 \
extra_samples=1. kl_per_partition=3.
Note: n_beams, extra_samples, kl_per_partition
correspond to B, (1 + epsilon), Omega
in the paper, respectively.
In the table below, we present single image, lossless compression performance in bits per dimension (lower is better). The best performing bits-back method is highlighted for each dataset. The asymptotic rates are included in parenthesis where they are different from the single image case. To calculate the number of bits needed for single images, we added the number of auxiliary bits required to the asymptotic compression rate as reported in the respective papers.
Cifar10 (32x32) | ImageNet32 (32x32) | Kodak (768x512) | ||
---|---|---|---|---|
Non bits-back | PNG | 5.87 | 6.39 | 4.35 |
WebP | 4.61 | 5.29 | 3.20 | |
FLIF | 4.19 | 4.52 | 2.90 | |
IDF | 3.34 | 4.18 | - | |
Bits-back | LBB | 54.96 (3.12) | 55.72 (3.88) | - |
BitSwap | 6.53 (3.82) | 6.97 (4.50) | - | |
HiLLoC | 24.51 (3.56) | 26.80 (4.20) | 17.5 (3.00) | |
iREC (Ours) | 4.18 | 4.91 | 3.67 | |
ELBO (RVAE) | [3.55] | [4.18] | [3.00] |
Note: The MS-SSIM comparisons are in decibels, calculated using the formula -10 * log10(1 - MS-SSIM)
.