-
Notifications
You must be signed in to change notification settings - Fork 31
Home
Alexandru Dinu edited this page Dec 17, 2018
·
27 revisions
Welcome to the cae wiki!
These models are inspired from [1].
As input, we have raw 720p images from YouTube-8M dataset (credit goes to gsssrao for the downloader and frames generator scripts). The dataset consists of 121,827 frames.
The images are padded to 1280x768 (i.e. 24,24 height pad), so that they can be split into 60 128x128 patches.
The model only gets to see a singular patch at a time; the loss is computed as MSELoss(orig_ij, out_ij)
(thus, there are 60 optimization steps per image).
[1] https://arxiv.org/abs/1703.00395
-
model_ae_conv_32x32x32_zero_pad_bin
- latent size is32x32x32
bits/patch (i.e. compressed size: 240KB) -
model_ae_conv_16x8x8_zero_pad_bin
- latent size is16x8x8
bits/patch (i.e. compressed size: 7.5KB) -
model_ae_conv_16x8x8_refl_pad_bin
- same as above, only that reflection pad is used (as opposed to zero pad) -
model_ae_conv_16x16x16_zero_pad_bin
- latent size is16x16x16
bits/patch (i.e. compressed size: 30KB)
Compressed size is 60 * latent_size / 8 / 1024
kilobytes.