Release v0.1.4 Released! · hpcaitech/ColossalAI

Main Features

Here are the main improvements of this release:

ColoTensor: A data structure that unifies the Tensor representation of different parallel methods.
Gemini: More efficient Genimi implementation reduces the overhead of model data statistic collection.
CLI: a command-line tool that helps users launch distributed training tasks more easily.
Pipeline Parallelism (PP): a more user-friendly API for PP.

What's Changed

ColoTensor

[tensor]fix colo_tensor torch_function by @Wesley-Jzy in #825
[tensor]fix test_linear by @Wesley-Jzy in #826
[tensor] ZeRO use ColoTensor as the base class. by @feifeibear in #828
[tensor] revert zero tensors back by @feifeibear in #829
[Tensor] overriding paramters() for Module using ColoTensor by @feifeibear in #889
[tensor] refine linear and add gather for laynorm by @Wesley-Jzy in #893
[Tensor] test parameters() as member function by @feifeibear in #896
[Tensor] activation is an attr of ColoTensor by @feifeibear in #897
[Tensor] initialize the ColoOptimizer by @feifeibear in #898
[tensor] reorganize files by @feifeibear in #820
[Tensor] apply ColoTensor on Torch functions by @feifeibear in #821
[Tensor] update ColoTensor torch_function by @feifeibear in #822
[tensor] lazy init by @feifeibear in #823
[WIP] Applying ColoTensor on TP-1D-row Linear. by @feifeibear in #831
Init Conext supports lazy allocate model memory by @feifeibear in #842
[Tensor] TP Linear 1D row by @Wesley-Jzy in #843
[Tensor] add assert for colo_tensor 1Drow by @Wesley-Jzy in #846
[Tensor] init a simple network training with ColoTensor by @feifeibear in #849
[Tensor ] Add 1Drow weight reshard by spec by @Wesley-Jzy in #854
[Tensor] add layer norm Op by @feifeibear in #852
[tensor] an initial dea of tensor spec by @feifeibear in #865
[Tensor] colo init context add device attr. by @feifeibear in #866
[tensor] add cross_entropy_loss by @feifeibear in #868
[Tensor] Add function to spec and update linear 1Drow and unit tests by @Wesley-Jzy in #869
[tensor] customized op returns ColoTensor by @feifeibear in #875
[Tensor] get named parameters for model using ColoTensors by @feifeibear in #874
[Tensor] Add some attributes to ColoTensor by @feifeibear in #877
[Tensor] make a simple net works with 1D row TP by @feifeibear in #879
[tensor] wrap function in the torch_tensor to ColoTensor by @Wesley-Jzy in #881
[Tensor] make ColoTensor more robust for getattr by @feifeibear in #886
[Tensor] test model check results for a simple net by @feifeibear in #887
[tensor] add ColoTensor 1Dcol by @Wesley-Jzy in #888

Gemini + ZeRO

[zero] add zero tensor shard strategy by @1SAA in #793
Revert "[zero] add zero tensor shard strategy" by @feifeibear in #806
[gemini] a new tensor structure by @feifeibear in #818
[gemini] APIs to set cpu memory capacity by @feifeibear in #809
[DO NOT MERGE] [zero] init fp16 params directly in ZeroInitContext by @ver217 in #808
[gemini] collect cpu-gpu moving volume in each iteration by @feifeibear in #813
[gemini] add GeminiMemoryManger by @1SAA in #832
[zero] use GeminiMemoryManager when sampling model data by @ver217 in #850
[gemini] polish code by @1SAA in #855
[gemini] add stateful tensor container by @1SAA in #867
[gemini] polish stateful_tensor_mgr by @1SAA in #876
[gemini] accelerate adjust_layout() by @ver217 in #878

CLI

[cli] added distributed launcher command by @YuliangLiu0306 in #791
[cli] added micro benchmarking for tp by @YuliangLiu0306 in #789
[cli] add missing requirement by @FrankLeeeee in #805
[cli] fixed a bug in user args and refactored the module structure by @FrankLeeeee in #807
[cli] fixed single-node process launching by @FrankLeeeee in #812
[cli] added check installation cli by @FrankLeeeee in #815
[CLI] refactored the launch CLI and fixed bugs in multi-node launching by @FrankLeeeee in #844
[cli] refactored micro-benchmarking cli and added more metrics by @FrankLeeeee in #858

Pipeline Parallelism

[pipelinable]use pipelinable context to initialize non-pipeline model by @YuliangLiu0306 in #816
[pipelinable]use ColoTensor to replace dummy tensor. by @YuliangLiu0306 in #853

Misc

[hotfix] fix auto tensor placement policy by @ver217 in #775
[hotfix] change the check assert in split batch 2d by @Wesley-Jzy in #772
[hotfix] fix bugs in zero by @1SAA in #781
[hotfix] fix grad offload when enabling reuse_fp16_shard by @ver217 in #784
[refactor] moving memtracer to gemini by @feifeibear in #801
[log] display tflops if available by @feifeibear in #802
[refactor] moving grad acc logic to engine by @feifeibear in #804
[log] local throughput metrics by @feifeibear in #811
[Bot] Synchronize Submodule References by @github-actions in #810
[Bot] Synchronize Submodule References by @github-actions in #819
[refactor] moving InsertPostInitMethodToModuleSubClasses to utils. by @feifeibear in #824
[setup] allow installation with python 3.6 by @FrankLeeeee in #834
Revert "[WIP] Applying ColoTensor on TP-1D-row Linear." by @feifeibear in #835
[dependency] removed torchvision by @FrankLeeeee in #833
[Bot] Synchronize Submodule References by @github-actions in #827
[unittest] refactored unit tests for change in dependency by @FrankLeeeee in #838
[setup] use env var instead of option for cuda ext by @FrankLeeeee in #839
[hotfix] ColoTensor pin_memory by @feifeibear in #840
modefied the pp build for ckpt adaptation by @Gy-Lu in #803
[hotfix] the bug of numel() in ColoTensor by @feifeibear in #845
[hotfix] fix _post_init_method of zero init ctx by @ver217 in #847
[hotfix] add deconstructor for stateful tensor by @ver217 in #848
[utils] refactor profiler by @ver217 in #837
[ci] cache cuda extension by @FrankLeeeee in #860
hotfix tensor unittest bugs by @feifeibear in #862
[usability] added assertion message in registry by @FrankLeeeee in #864
[doc] improved docstring in the communication module by @FrankLeeeee in #863
[doc] improved docstring in the logging module by @FrankLeeeee in #861
[doc] improved docstring in the amp module by @FrankLeeeee in #857
[usability] improved error messages in the context module by @FrankLeeeee in #856
[doc] improved error messages in initialize by @FrankLeeeee in #872
[doc] improved assertion messages in trainer by @FrankLeeeee in #873
[doc] improved docstring and assertion messages for the engine module by @FrankLeeeee in #871
[hotfix] fix import error by @ver217 in #880
[setup] add local version label by @ver217 in #890
[model_zoo] change qkv processing by @Gy-Lu in #870

Full Changelog: v0.1.3...v0.1.4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.1.4 Released!

Main Features

What's Changed

ColoTensor

Gemini + ZeRO

CLI

Pipeline Parallelism

Misc

Contributors