-
Clone this repository
git clone https://github.com/fesiib/cs492i-layout-generation.git cd cs492i-layout-generation
-
Create a new conda environment (Python 3.8)
conda env create -f environment.yml conda activate layout-generation
-
Change the directories appropriately in
train
andtest
files in eachsrc
We assume that all the pretrained models are inresults
folder. Avoid using prefixtrial_
, as it might get deleted while training
- Ubuntu 18.04, CUDA 11.3
Access one of src_*
and run test.ipynb
Let SRC
be one of src_lstm, src_transformer
and TRAIN
be one of train_*.py
python SRC/train TRAIN
Checkpoints with metavariables will be saved in folder ./results
Models | Epochs | Link | Comments |
---|---|---|---|
LSTM-GAN | 329 | Drive | |
Transformer-GAN | 249 | Drive | Requires LayoutGAN++ |
Transformer-MSE | 249 | Drive | |
LayoutGAN++ | 499 | Drive |
Transformer-GAN is adapted LayouGAN++[3] and uses pretrained frozen LayoutGAN++ that we provide above.
Dataset is located in ./data/bbs/
in .csv
format.
Was Generated from DOC2PPT[1] Dataset with FitVid layout detection (fine-tuned CenterNet[2]) model.
The structure is as follows:
Slide Deck Id,Slide Id,Image Height,Image Width,Type,X,Y,BB Width,BB Height
Models | mIOU | Accuracy (MSE) | Overlap |
---|---|---|---|
LSTM-GAN | 0.0304 | 0.0352 | 0.3579 |
Transformer-GAN | 0.0098 | 0.2422 | 1.4003 |
Transformer-MSE | 0.0798 | 0.0151 | 1.0448 |
Overlap in the actual dataset: 0.1700
.
[1] DOC2PPT: Automatic Presentation Slides Generation from Scientific Documents, Tsu-Jui Fu, William Yang Wang, Daniel McDuff, Yale Song, 2021
[2] CenterNet: Keypoint Triplets for Object Detection, Kaiwen Duan, Song Bai, Lingxi Xie, Honggang Qi, Qingming Huang, Qi Tian, 2019
[3] Constrained Graphic Layout Generation via Latent Optimization, Kotaro Kikuchi, Edgar Simo-Serra, Mayu Otani, Kota Yamaguchi, 2021