v0.1.4 Released!
Main Features
Here are the main improvements of this release:
- ColoTensor: A data structure that unifies the Tensor representation of different parallel methods.
- Gemini: More efficient Genimi implementation reduces the overhead of model data statistic collection.
- CLI: a command-line tool that helps users launch distributed training tasks more easily.
- Pipeline Parallelism (PP): a more user-friendly API for PP.
What's Changed
ColoTensor
- [tensor]fix colo_tensor torch_function by @Wesley-Jzy in #825
- [tensor]fix test_linear by @Wesley-Jzy in #826
- [tensor] ZeRO use ColoTensor as the base class. by @feifeibear in #828
- [tensor] revert zero tensors back by @feifeibear in #829
- [Tensor] overriding paramters() for Module using ColoTensor by @feifeibear in #889
- [tensor] refine linear and add gather for laynorm by @Wesley-Jzy in #893
- [Tensor] test parameters() as member function by @feifeibear in #896
- [Tensor] activation is an attr of ColoTensor by @feifeibear in #897
- [Tensor] initialize the ColoOptimizer by @feifeibear in #898
- [tensor] reorganize files by @feifeibear in #820
- [Tensor] apply ColoTensor on Torch functions by @feifeibear in #821
- [Tensor] update ColoTensor torch_function by @feifeibear in #822
- [tensor] lazy init by @feifeibear in #823
- [WIP] Applying ColoTensor on TP-1D-row Linear. by @feifeibear in #831
- Init Conext supports lazy allocate model memory by @feifeibear in #842
- [Tensor] TP Linear 1D row by @Wesley-Jzy in #843
- [Tensor] add assert for colo_tensor 1Drow by @Wesley-Jzy in #846
- [Tensor] init a simple network training with ColoTensor by @feifeibear in #849
- [Tensor ] Add 1Drow weight reshard by spec by @Wesley-Jzy in #854
- [Tensor] add layer norm Op by @feifeibear in #852
- [tensor] an initial dea of tensor spec by @feifeibear in #865
- [Tensor] colo init context add device attr. by @feifeibear in #866
- [tensor] add cross_entropy_loss by @feifeibear in #868
- [Tensor] Add function to spec and update linear 1Drow and unit tests by @Wesley-Jzy in #869
- [tensor] customized op returns ColoTensor by @feifeibear in #875
- [Tensor] get named parameters for model using ColoTensors by @feifeibear in #874
- [Tensor] Add some attributes to ColoTensor by @feifeibear in #877
- [Tensor] make a simple net works with 1D row TP by @feifeibear in #879
- [tensor] wrap function in the torch_tensor to ColoTensor by @Wesley-Jzy in #881
- [Tensor] make ColoTensor more robust for getattr by @feifeibear in #886
- [Tensor] test model check results for a simple net by @feifeibear in #887
- [tensor] add ColoTensor 1Dcol by @Wesley-Jzy in #888
Gemini + ZeRO
- [zero] add zero tensor shard strategy by @1SAA in #793
- Revert "[zero] add zero tensor shard strategy" by @feifeibear in #806
- [gemini] a new tensor structure by @feifeibear in #818
- [gemini] APIs to set cpu memory capacity by @feifeibear in #809
- [DO NOT MERGE] [zero] init fp16 params directly in ZeroInitContext by @ver217 in #808
- [gemini] collect cpu-gpu moving volume in each iteration by @feifeibear in #813
- [gemini] add GeminiMemoryManger by @1SAA in #832
- [zero] use GeminiMemoryManager when sampling model data by @ver217 in #850
- [gemini] polish code by @1SAA in #855
- [gemini] add stateful tensor container by @1SAA in #867
- [gemini] polish stateful_tensor_mgr by @1SAA in #876
- [gemini] accelerate adjust_layout() by @ver217 in #878
CLI
- [cli] added distributed launcher command by @YuliangLiu0306 in #791
- [cli] added micro benchmarking for tp by @YuliangLiu0306 in #789
- [cli] add missing requirement by @FrankLeeeee in #805
- [cli] fixed a bug in user args and refactored the module structure by @FrankLeeeee in #807
- [cli] fixed single-node process launching by @FrankLeeeee in #812
- [cli] added check installation cli by @FrankLeeeee in #815
- [CLI] refactored the launch CLI and fixed bugs in multi-node launching by @FrankLeeeee in #844
- [cli] refactored micro-benchmarking cli and added more metrics by @FrankLeeeee in #858
Pipeline Parallelism
- [pipelinable]use pipelinable context to initialize non-pipeline model by @YuliangLiu0306 in #816
- [pipelinable]use ColoTensor to replace dummy tensor. by @YuliangLiu0306 in #853
Misc
- [hotfix] fix auto tensor placement policy by @ver217 in #775
- [hotfix] change the check assert in split batch 2d by @Wesley-Jzy in #772
- [hotfix] fix bugs in zero by @1SAA in #781
- [hotfix] fix grad offload when enabling reuse_fp16_shard by @ver217 in #784
- [refactor] moving memtracer to gemini by @feifeibear in #801
- [log] display tflops if available by @feifeibear in #802
- [refactor] moving grad acc logic to engine by @feifeibear in #804
- [log] local throughput metrics by @feifeibear in #811
- [Bot] Synchronize Submodule References by @github-actions in #810
- [Bot] Synchronize Submodule References by @github-actions in #819
- [refactor] moving InsertPostInitMethodToModuleSubClasses to utils. by @feifeibear in #824
- [setup] allow installation with python 3.6 by @FrankLeeeee in #834
- Revert "[WIP] Applying ColoTensor on TP-1D-row Linear." by @feifeibear in #835
- [dependency] removed torchvision by @FrankLeeeee in #833
- [Bot] Synchronize Submodule References by @github-actions in #827
- [unittest] refactored unit tests for change in dependency by @FrankLeeeee in #838
- [setup] use env var instead of option for cuda ext by @FrankLeeeee in #839
- [hotfix] ColoTensor pin_memory by @feifeibear in #840
- modefied the pp build for ckpt adaptation by @Gy-Lu in #803
- [hotfix] the bug of numel() in ColoTensor by @feifeibear in #845
- [hotfix] fix _post_init_method of zero init ctx by @ver217 in #847
- [hotfix] add deconstructor for stateful tensor by @ver217 in #848
- [utils] refactor profiler by @ver217 in #837
- [ci] cache cuda extension by @FrankLeeeee in #860
- hotfix tensor unittest bugs by @feifeibear in #862
- [usability] added assertion message in registry by @FrankLeeeee in #864
- [doc] improved docstring in the communication module by @FrankLeeeee in #863
- [doc] improved docstring in the logging module by @FrankLeeeee in #861
- [doc] improved docstring in the amp module by @FrankLeeeee in #857
- [usability] improved error messages in the context module by @FrankLeeeee in #856
- [doc] improved error messages in initialize by @FrankLeeeee in #872
- [doc] improved assertion messages in trainer by @FrankLeeeee in #873
- [doc] improved docstring and assertion messages for the engine module by @FrankLeeeee in #871
- [hotfix] fix import error by @ver217 in #880
- [setup] add local version label by @ver217 in #890
- [model_zoo] change qkv processing by @Gy-Lu in #870
Full Changelog: v0.1.3...v0.1.4