Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* [WIP] distributed * [WIP] distributed training * add script * add smaller script * fix device * fix DDP model setting * reorder model on device * remove model.model... * set print on process 0 * remove duplicate * model folder set on main process * test * model.update fix * exists_ok=True * add exist_ok * jz multinode fix * fix batch size splitting * isort and black * [WIP] work on trainers * [WIP] work on Distributed and trainers * add tets example with adversarial trainer * fix small issue * fix master addr environ * fix typo * fix update with DDP * udpate callback for distributed training * diplay progress per process * enhance display * [WIP] make CoupledAdv distributed * Cealn up trainers and add distributed training * fix piwae tests * fix test piwae * fix some tests * increase coverage * increase coverage * add predict on main process * apply balck and isort * update notebooks with batch_size * update reproducibility scripts * clean up * isort & black * update README * remove assert 0 * update distributed script * add wandb * update script * log only on main process * test batch size * loss dubugging * test with AE * test with adaptive batchsize * test with larger batch size * benchmark * benchmark perf * remove debug prints * redece learning rate * show results * new net * lr * remove sigm * lr * epochs * batch_size * new test * with sigm * test * test * retest * retest * with rank * test in trainer * retest * test * test * test no embedding * test * test distributed * debuggin * debug * not learnable codebook * fix typo * contiguous * fix issue * test inplace * no_grad( * debug * find unused * debug * test with dist_nn * remove find_unused * test with dist.nn * chekc rank * remove all_reduce * test with ddp * second all_reduce * async * add detach * add detach * test * debug * change * with einsum * contiguous * remove parameter * new test * debug * debug * add barrier * remove embeddings * update code * update * update * mass sanity check on all process * revert to good VQVAE * remove prints * add dist backend to script * reduce number of epoch in example * udpate doc * increase batch size in example * add other script * remove find_unused * test without unused * fix ununsed * add num_workers option to Training config * add num_workers to scripts * test with embedding * remove learned codebook * grad accumulation for benchmark * beanchmark * add grad accumulation * remove print * benchmark * remove num_workers * add FFHQ to benchmark * fix predict * fix predict * reduce number of samples in predict * add parser * add sigmoid * update config * add imagenet script * convert img to RGB * add sigmoid to decoder * increase batch size * change nets * change nets * add new script * add convert to RGB * update tests * clean up * prepare release * update doc * fix input_dim * last figures * doc fix
- Loading branch information