Release V0.1.2 Released! · hpcaitech/ColossalAI

Overview

Here are the main improvements of this release:

MOE and BERT models can be trained with ZeRO.
Provide a uniform checkpoint for all kinds of parallelism.
Optimize ZeRO-offload, and improve model scaling.
Design a uniform model memory tracer.
Implement an efficient hybrid Adam (CPU and CUDA kernels).
Improve activation offloading.
Profiler TensorBoard plugin of Beta version.
Refactor pipeline module for closer integration with engine.
Chinese tutorials, WeChat and Slack user groups.

What's Changed

Features

[zero] get memory usage for sharded param by @feifeibear in #536
[zero] improve the accuracy of get_memory_usage of sharded param by @feifeibear in #538
[zero] refactor model data tracing by @feifeibear in #537
[zero] get memory usage of sharded optim v2. by @feifeibear in #542
[zero] polish ZeroInitContext by @ver217 in #540
[zero] optimize grad offload by @ver217 in #539
[zero] non model data tracing by @feifeibear in #545
[zero] add zero config to neutralize zero context init by @1SAA in #546
[zero] dump memory stats for sharded model by @feifeibear in #548
[zero] add stateful tensor by @feifeibear in #549
[zero] label state for param fp16 and grad by @feifeibear in #551
[zero] hijack p.grad in sharded model by @ver217 in #554
[utils] update colo tensor moving APIs by @feifeibear in #553
[polish] rename col_attr -> colo_attr by @feifeibear in #558
[zero] trace states of fp16/32 grad and fp32 param by @ver217 in #571
[zero] adapt zero for unsharded parameters by @1SAA in #561
[refactor] memory utils by @feifeibear in #577
Feature/checkpoint gloo by @kurisusnowdeng in #589
[zero] add sampling time for memstats collector by @Gy-Lu in #610
[model checkpoint] checkpoint utils by @kurisusnowdeng in #592
[model checkpoint][hotfix] unified layers for save&load by @kurisusnowdeng in #593
Feature/checkpoint 2D by @kurisusnowdeng in #595
Feature/checkpoint 1D by @kurisusnowdeng in #594
[model checkpoint] CPU communication ops by @kurisusnowdeng in #590
Feature/checkpoint 2.5D by @kurisusnowdeng in #596
Feature/Checkpoint 3D by @kurisusnowdeng in #597
[model checkpoint] checkpoint hook by @kurisusnowdeng in #598
Feature/Checkpoint tests by @kurisusnowdeng in #599
[zero] adapt zero for unsharded parameters (Optimizer part) by @1SAA in #601
[zero] polish init context by @feifeibear in #645
refactor pipeline---put runtime schedule into engine. by @YuliangLiu0306 in #627

Bug Fix

[Zero] process no-leaf-module in Zero by @1SAA in #535
Add gather_out arg to Linear by @Wesley-Jzy in #541
[hoxfix] fix parallel_input flag for Linear1D_Col gather_output by @Wesley-Jzy in #579
[hotfix] add hybrid adam to init by @ver217 in #584
Hotfix/path check util by @kurisusnowdeng in #591
[hotfix] fix sharded optim zero grad by @ver217 in #604
Add tensor parallel input check by @Wesley-Jzy in #621
[hotfix] Raise messages for indivisible batch sizes with tensor parallelism by @number1roy in #622
[zero] fixed the activation offload by @Gy-Lu in #647
fixed bugs in CPU adam by @1SAA in #633
Revert "[zero] polish init context" by @feifeibear in #657
[hotfix] fix a bug in model data stats tracing by @feifeibear in #655
fix bugs for unsharded parameters when restore data by @1SAA in #664

Unit Testing

[zero] test zero tensor utils by @FredHuang99 in #609
remove hybrid adam in test_moe_zero_optim by @1SAA in #659

Documentation

Refactored docstring to google style by @number1roy in #532
[docs] updatad docs of hybrid adam and cpu adam by @Gy-Lu in #552
html refactor by @number1roy in #555
[doc] polish docstring of zero by @ver217 in #612
[doc] update rst by @ver217 in #615
[doc] polish amp docstring by @ver217 in #616
[doc] polish moe docsrting by @ver217 in #618
[doc] polish optimizer docstring by @ver217 in #619
[doc] polish utils docstring by @ver217 in #620
[NFC] polish colossalai/kernel/cuda_native/csrc/kernels/cuda_util.cu … by @GaryGky in #625
[doc] polish checkpoint docstring by @ver217 in #637
update GPT-2 experiment result by @Sze-qq in #666
[NFC] polish code by @binmakeswell in #646

Model Zoo

[model zoo] add activation offload for gpt model by @Gy-Lu in #582

Miscellaneous

[logging] polish logger format by @feifeibear in #543
[profiler] add MemProfiler by @raejaf in #356
[Bot] Synchronize Submodule References by @github-actions in #501
[tool] create .clang-format for pre-commit by @BoxiangW in #578
[GitHub] Add prefix and label in issue template by @binmakeswell in #652

Full Changelog: v0.1.1...v0.1.2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

V0.1.2 Released!

Overview

What's Changed

Features

Bug Fix

Unit Testing

Documentation

Model Zoo

Miscellaneous

Contributors