Skip to content

Commit

Permalink
First commit
Browse files Browse the repository at this point in the history
  • Loading branch information
rewonc committed Nov 19, 2020
0 parents commit 43312d8
Show file tree
Hide file tree
Showing 15 changed files with 1,384 additions and 0 deletions.
7 changes: 7 additions & 0 deletions LICENSE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
Copyright © 2020 OpenAI

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
103 changes: 103 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,103 @@
# Very Deep VAEs

Repository for the paper "Very Deep VAEs Generalize Autoregressive Models and Can Outperform Them on Images," submitted to ICLR 2021 (https://openreview.net/forum?id=RLRXCV6DbEJ)

Some model samples and a visualization of how it generates them:
![image](header-image.png)

This repository is tested with PyTorch 1.6, CUDA 10.1, Numpy 1.16, Ubuntu 18.04, and V100 GPUs.

# Setup
Several additional packages are required, including NVIDIA Apex:
```
pip install imageio
pip install mpi4py
pip install sklearn
git clone https://github.com/NVIDIA/apex
cd apex
pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./
cd ..
```

Also, you'll have to download the data, depending on which one you want to run:
```
./setup_cifar10.sh
./setup_imagenet.sh imagenet32
./setup_imagenet.sh imagenet64
./setup_ffhq256.sh
./setup_ffhq1024.sh /path/to/images1024x1024 # this one depends on you first downloading the subfolder `images_1024x1024` from https://github.com/NVlabs/ffhq-dataset on your own
```

# Training models
Hyperparameters all reside in `hps.py`. We use 2 gpus for our CIFAR-10 runs, and 32 for the rest of the models. (Using a lower batch size is also possible and results in slower learning, and may also require a lower learning rate).

The `mpiexec` arguments you use for runs with more than 1 node depend on the configuration of your system, so please adapt accordingly.

```bash
mpiexec -n 2 python train.py --hps cifar10
mpiexec -n 32 python train.py --hps imagenet32
mpiexec -n 32 python train.py --hps imagenet64
mpiexec -n 32 python train.py --hps ffhq256
mpiexec -n 32 python train.py --hps ffhq1024
```

# Restoring saved models
For convenience, we have included training checkpoints which can be restored in order to confirm performance, continue training, or generate samples.

### ImageNet 32
```bash
# 119M parameter model, trained for 1.7M iters (about 2.5 weeks on 32 V100)
wget https://openaipublic.blob.core.windows.net/very-deep-vaes-assets/vdvae-assets/imagenet32-iter-1700000-log.jsonl
wget https://openaipublic.blob.core.windows.net/very-deep-vaes-assets/vdvae-assets/imagenet32-iter-1700000-model.th
wget https://openaipublic.blob.core.windows.net/very-deep-vaes-assets/vdvae-assets/imagenet32-iter-1700000-model-ema.th
wget https://openaipublic.blob.core.windows.net/very-deep-vaes-assets/vdvae-assets/imagenet32-iter-1700000-opt.th
python train.py --hps imagenet32 --restore_path imagenet32-iter-1700000-model.th --restore_ema_path imagenet32-iter-1700000-model-ema.th --restore_log_path imagenet32-iter-1700000-log.jsonl --restore_optimizer_path imagenet32-iter-1700000-opt.th --test_eval
# should give 2.6364 nats per dim, which is 3.80 bpd
```

### ImageNet 64
```bash
# 125M parameter model, trained for 1.6M iters (about 2.5 weeks on 32 V100)
wget https://openaipublic.blob.core.windows.net/very-deep-vaes-assets/vdvae-assets-2/imagenet64-iter-1600000-log.jsonl
wget https://openaipublic.blob.core.windows.net/very-deep-vaes-assets/vdvae-assets-2/imagenet64-iter-1600000-model.th
wget https://openaipublic.blob.core.windows.net/very-deep-vaes-assets/vdvae-assets-2/imagenet64-iter-1600000-model-ema.th
wget https://openaipublic.blob.core.windows.net/very-deep-vaes-assets/vdvae-assets-2/imagenet64-iter-1600000-opt.th
python train.py --hps imagenet64 --restore_path imagenet64-iter-1600000-model.th --restore_ema_path imagenet64-iter-1600000-model-ema.th --restore_log_path imagenet64-iter-1600000-log.jsonl --restore_optimizer_path imagenet64-iter-1600000-opt.th --test_eval
# should be 2.44 nats, or 3.52 bits per dim
```

### FFHQ-256
```bash
# 115M parameters, trained for 1.7M iterations (or about 2.5 weeks) on 32 V100
wget https://openaipublic.blob.core.windows.net/very-deep-vaes-assets/vdvae-assets/ffhq256-iter-1700000-log.jsonl
wget https://openaipublic.blob.core.windows.net/very-deep-vaes-assets/vdvae-assets/ffhq256-iter-1700000-model.th
wget https://openaipublic.blob.core.windows.net/very-deep-vaes-assets/vdvae-assets/ffhq256-iter-1700000-model-ema.th
wget https://openaipublic.blob.core.windows.net/very-deep-vaes-assets/vdvae-assets/ffhq256-iter-1700000-opt.th
python train.py --hps ffhq256 --restore_path ffhq256-iter-1700000-model.th --restore_ema_path ffhq256-iter-1700000-model-ema.th --restore_log_path ffhq256-iter-1700000-log.jsonl --restore_optimizer_path ffhq256-iter-1700000-opt.th --test_eval
# should be 0.4232 nats, or 0.61 bits per dim
```

### FFHQ-1024
```bash
# 115M parameters, trained for 1.7M iterations (or about 2.5 weeks) on 32 V100
wget https://openaipublic.blob.core.windows.net/very-deep-vaes-assets/vdvae-assets/ffhq1024-iter-1700000-log.jsonl
wget https://openaipublic.blob.core.windows.net/very-deep-vaes-assets/vdvae-assets/ffhq1024-iter-1700000-model.th
wget https://openaipublic.blob.core.windows.net/very-deep-vaes-assets/vdvae-assets/ffhq1024-iter-1700000-model-ema.th
wget https://openaipublic.blob.core.windows.net/very-deep-vaes-assets/vdvae-assets/ffhq1024-iter-1700000-opt.th
python train.py --hps ffhq1024 --restore_path ffhq1024-iter-1700000-model.th --restore_ema_path ffhq1024-iter-1700000-model-ema.th --restore_log_path ffhq1024-iter-1700000-log.jsonl --restore_optimizer_path ffhq1024-iter-1700000-opt.th --test_eval
# should be 1.678 nats, or 2.42 bits per dim
```

### CIFAR-10
```bash
# 39M parameters, trained for ~1M iterations with early stopping (a little less than a week on 2 GPUs)
wget https://openaipublic.blob.core.windows.net/very-deep-vaes-assets/vdvae-assets-2/cifar10-seed0-iter-900000-model-ema.th
wget https://openaipublic.blob.core.windows.net/very-deep-vaes-assets/vdvae-assets-2/cifar10-seed1-iter-1050000-model-ema.th
wget https://openaipublic.blob.core.windows.net/very-deep-vaes-assets/vdvae-assets-2/cifar10-seed2-iter-650000-model-ema.th
wget https://openaipublic.blob.core.windows.net/very-deep-vaes-assets/vdvae-assets-2/cifar10-seed3-iter-1050000-model-ema.th
python train.py --hps cifar10 --restore_ema_path cifar10-seed0-iter-900000-model-ema.th --test_eval
python train.py --hps cifar10 --restore_ema_path cifar10-seed1-iter-1050000-model-ema.th --test_eval
python train.py --hps cifar10 --restore_ema_path cifar10-seed2-iter-650000-model-ema.th --test_eval
python train.py --hps cifar10 --restore_ema_path cifar10-seed3-iter-1050000-model-ema.th --test_eval
# seeds 0, 1, 2, 3 should give 2.879, 2.842, 2.898, 2.864 bits per dim, for an average of 2.87 bits per dim.
```
163 changes: 163 additions & 0 deletions data.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,163 @@
import numpy as np
import pickle
import os
import torch
from torch.utils.data import TensorDataset
from torchvision.datasets import ImageFolder
import torchvision.transforms as transforms
from sklearn.model_selection import train_test_split


def set_up_data(H):
shift_loss = -127.5
scale_loss = 1. / 127.5
if H.dataset == 'imagenet32':
trX, vaX, teX = imagenet32(H.data_root)
H.image_size = 32
H.image_channels = 3
shift = -116.2373
scale = 1. / 69.37404
elif H.dataset == 'imagenet64':
trX, vaX, teX = imagenet64(H.data_root)
H.image_size = 64
H.image_channels = 3
shift = -115.92961967
scale = 1. / 69.37404
elif H.dataset == 'ffhq_256':
trX, vaX, teX = ffhq256(H.data_root)
H.image_size = 256
H.image_channels = 3
shift = -112.8666757481
scale = 1. / 69.84780273
elif H.dataset == 'ffhq_1024':
trX, vaX, teX = ffhq1024(H.data_root)
H.image_size = 1024
H.image_channels = 3
shift = -0.4387
scale = 1.0 / 0.2743
shift_loss = -0.5
scale_loss = 2.0
elif H.dataset == 'cifar10':
(trX, _), (vaX, _), (teX, _) = cifar10(H.data_root, one_hot=False)
H.image_size = 32
H.image_channels = 3
shift = -120.63838
scale = 1. / 64.16736
else:
raise ValueError('unknown dataset: ', H.dataset)

do_low_bit = H.dataset in ['ffhq_256']

if H.test_eval:
print('DOING TEST')
eval_dataset = teX
else:
eval_dataset = vaX

shift = torch.tensor([shift]).cuda().view(1, 1, 1, 1)
scale = torch.tensor([scale]).cuda().view(1, 1, 1, 1)
shift_loss = torch.tensor([shift_loss]).cuda().view(1, 1, 1, 1)
scale_loss = torch.tensor([scale_loss]).cuda().view(1, 1, 1, 1)

if H.dataset == 'ffhq_1024':
train_data = ImageFolder(trX, transforms.ToTensor())
valid_data = ImageFolder(eval_dataset, transforms.ToTensor())
untranspose = True
else:
train_data = TensorDataset(torch.as_tensor(trX))
valid_data = TensorDataset(torch.as_tensor(eval_dataset))
untranspose = False

def preprocess_func(x):
nonlocal shift
nonlocal scale
nonlocal shift_loss
nonlocal scale_loss
nonlocal do_low_bit
nonlocal untranspose
'takes in a data example and returns the preprocessed input'
'as well as the input processed for the loss'
if untranspose:
x[0] = x[0].permute(0, 2, 3, 1)
inp = x[0].cuda(non_blocking=True).float()
out = inp.clone()
inp.add_(shift).mul_(scale)
if do_low_bit:
# 5 bits of precision
out.mul_(1. / 8.).floor_().mul_(8.)
out.add_(shift_loss).mul_(scale_loss)
return inp, out

return H, train_data, valid_data, preprocess_func


def mkdir_p(path):
os.makedirs(path, exist_ok=True)


def flatten(outer):
return [el for inner in outer for el in inner]


def unpickle_cifar10(file):
fo = open(file, 'rb')
data = pickle.load(fo, encoding='bytes')
fo.close()
data = dict(zip([k.decode() for k in data.keys()], data.values()))
return data


def imagenet32(data_root):
trX = np.load(os.path.join(data_root, 'imagenet32-train.npy'), mmap_mode='r')
np.random.seed(42)
tr_va_split_indices = np.random.permutation(trX.shape[0])
train = trX[tr_va_split_indices[:-5000]]
valid = trX[tr_va_split_indices[-5000:]]
test = np.load(os.path.join(data_root, 'imagenet32-valid.npy'), mmap_mode='r')
return train, valid, test


def imagenet64(data_root):
trX = np.load(os.path.join(data_root, 'imagenet64-train.npy'), mmap_mode='r')
np.random.seed(42)
tr_va_split_indices = np.random.permutation(trX.shape[0])
train = trX[tr_va_split_indices[:-5000]]
valid = trX[tr_va_split_indices[-5000:]]
test = np.load(os.path.join(data_root, 'imagenet64-valid.npy'), mmap_mode='r') # this is test.
return train, valid, test


def ffhq1024(data_root):
# we did not significantly tune hyperparameters on ffhq-1024, and so simply evaluate on the test set
return os.path.join(data_root, 'ffhq1024/train'), os.path.join(data_root, 'ffhq1024/valid'), os.path.join(data_root, 'ffhq1024/valid')


def ffhq256(data_root):
trX = np.load(os.path.join(data_root, 'ffhq-256.npy'), mmap_mode='r')
np.random.seed(5)
tr_va_split_indices = np.random.permutation(trX.shape[0])
train = trX[tr_va_split_indices[:-7000]]
valid = trX[tr_va_split_indices[-7000:]]
# we did not significantly tune hyperparameters on ffhq-256, and so simply evaluate on the test set
return train, valid, valid


def cifar10(data_root, one_hot=True):
tr_data = [unpickle_cifar10(os.path.join(data_root, 'cifar-10-batches-py/', 'data_batch_%d' % i)) for i in range(1, 6)]
trX = np.vstack(data['data'] for data in tr_data)
trY = np.asarray(flatten([data['labels'] for data in tr_data]))
te_data = unpickle_cifar10(os.path.join(data_root, 'cifar-10-batches-py/', 'test_batch'))
teX = np.asarray(te_data['data'])
teY = np.asarray(te_data['labels'])
trX = trX.reshape(-1, 3, 32, 32).transpose(0, 2, 3, 1)
teX = teX.reshape(-1, 3, 32, 32).transpose(0, 2, 3, 1)
trX, vaX, trY, vaY = train_test_split(trX, trY, test_size=5000, random_state=11172018)
if one_hot:
trY = np.eye(10, dtype=np.float32)[trY]
vaY = np.eye(10, dtype=np.float32)[vaY]
teY = np.eye(10, dtype=np.float32)[teY]
else:
trY = np.reshape(trY, [-1, 1])
vaY = np.reshape(vaY, [-1, 1])
teY = np.reshape(teY, [-1, 1])
return (trX, trY), (vaX, vaY), (teX, teY)
14 changes: 14 additions & 0 deletions files_to_npy.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
import sys
import numpy as np
import imageio
import glob
import os

if __name__ == "__main__":
print("moving images in", sys.argv[1], "to", sys.argv[2])
files = glob.glob(os.path.join(sys.argv[1], "*.png"))
shape = imageio.imread(files[0]).shape
data = np.zeros(shape=(len(files), *shape), dtype=np.uint8)
for idx, f in enumerate(files):
data[idx] = imageio.imread(f)
np.save(sys.argv[2], data)
Binary file added header-image.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading

0 comments on commit 43312d8

Please sign in to comment.