CHANGELOG

[0.9.0] - 2022-01-13

Bug Fixes

Other

Reuse fused parameter tensors in fuse_step (#410)
Call step closure in qadam optimizer step (#432)
Fix need_reset condition (#454)
Do negotiation in async native op (#447)
Fix find_unused_parameters (#452)
Fix qadam non-deterministic (#459)
Add LIBRARY_PATH env in install_master.sh (#465)
Fix typo in install_master.sh (#471)

Python

CUDA 11.5 can't get nccl package (#415)
Fix process group compatibility with torch 1.6.0 (#413)
Fix ci random fail (#445)
Fix async algorithm (#479)

Features

Core

Initial support for C interface (#325)

Other

Support NODE_RANK environment variable (#426)
Choose bagua service port dynamically (#431)
Use bagua_module_name to identify different modules (#438)
Add algorithm registry (#433)
Add compatibility for NCCL version under 2.10 (#449)
Add broadcast object api (#437)
Support qadam in fused optimizer (#477)

Python

Support PyTorch DDP compatible distributed training API (#312)
Support torch-api-compatiable all_reduce (#377)
Associate PyTorch Process Group with Bagua Process Group using cache (#402)
Support find_unused_parameters on BaguaDDP (#409)
Add BAGUA_AUTOTUNE_SERVER_WAIT_TIME env (#474)

[0.8.2] - 2021-11-09

Bug Fixes

Other

Fuse optimizer oom and make it stateless (#207)
To_bagua_tensor compatibility with torch 1.6.0 (#355)

Python

Use separate process group for async communication thread to avoid potential hangs (#298)
Do not fail if checkpoints path exist (#305)
Fix is_moe_param (#306)
Change to_bagua_tensor API to support PyTorch 1.10 (#338)
Fix fused optimizer with multiple param groups (#356)

Features

Python

Support switching between different algorithms (#299)
Separate algorithm declaration and implementation (#246)

Python, core

Support process group in with_bagua, support hierarchical communication in bytegrad algorithm (#300)
Support mutable bucket tensors (#271)
Support all_to_all_single (#361)

[0.8.1] - 2021-10-16

Features

Other

Use single bucket for decentralized algorithm to improve performance (#275)
Support process group (#228)
Add barrier api (#290)

Python

Support moe (#208)
Support checkpointing for moe (#242)

[0.8.0] - 2021-09-26

Bug Fixes

Ci

Only run publish once on git tag

Core

Fix compressed buffer can not be scattered to odd number of ranks

Other

Fix ci pypi versioning
Remove init.py and python version, use cargo version
Move import bagua_install_library to install library function
Merge bagua_install_library and setup.py, remove nccl<=2.6 support
Fix alltoall_v parameter (#17)
Reduce and allgather python interface
Fix decompress incorrect pointer and typo in error msg
Fix python gil deadlock during getting data ptr
Fix benchmark script requirements
Fix alltoall_v parameter types (#27)
Always mark bagua padding tensor as ready
Make compress/decompress of BaguaTensor method string consistent (#33)
Fix scatter and reduce_scatter implementation (#40)
Substract overflow error for decentralized op (#39)
Fix QADAM params (#17)
Fix assert precision (#18)
Replace mutex with atomic bool for async op and add Aluminum submodule update (#67)
Fix duplicated dependency downloading during installation (#77)
Fix async algorithm aborting and hanging (#78, #81)
Fix qadam algorithm call (#20)
Fix missing symbols in the zip library (#24)
Fix random autotune server hang (#206)
Bagua-net library path mismatch, make --enable_bagua_net argument style consistent with other args (#218)

Python

Fix random autotune-service hang
Handle conflicts caused by sklearn upgrade (#225)

Features

Ci

Only publish pypi for master commits

Other

Add async model average algorithm (#110)
Add cached dataset wrapper (#148)
Support sync batchnorm (#151)
Add --enable-bagua-net option in launcher (#183)
Add pytorch examples for MNIST, ImageNet, SQuAD training (#1)
Add requirements.txt, only download dataset on local rank 0 (#2)
Add python packaging related files
Add __version__ variable
Install nccl deps in bagua core and add generated __version__ variable
Add version.py placeholder to prevent file not found error
Initial support for python op (#2)
Add 5 min timeout for buckets' comm op (#5)
Replace NCCL with Aluminum (#7)
Add synethetic benchmark script (#5)
Add elastic training example (#7)
Support alltoall_v (vector alltoall) (#14)
Add reduce and allgather python interface
Support reduce and allgather op with Reduction op enum
Support creating BaguaTensor by passing torch tensor directly (#19)
Compatible mode for getting pytorch tensor info with Python interpreter
Better debug log including tensor info when executing ops
Add native low precision decentralized operator (#26)
Add (scatter, gather, scatter_reduce) and all inplace version communication primitives (#37)
Make full precision decentralized op stateless (#36)
Add communication_primitives example (#12)
Use nccl 2.10 avg op for all algorithms using averaging (#46, #45)
Add opentelemetry to report tensor ready order (#42)
Add deterministic flag (#15)
Add native async model average algorithm (#41)
Add examples for async model average algorithm (#14)
Support packet splitting and multi-stream parallel transmission (#5)
Support ncclnet v3 and remove the dependency on nccl in the installation environment (#17)
Add sync interval param to async examples (#19)
Suppport tokio backend (#21)
Support bagua-net (#89)

Python

Broadcast scalars for optimizers (#202)

[0.7.0] - 2021-08-16

Bug Fixes

Make compress/decompress of BaguaTensor method string consistent (#33)
Fix scatter and reduce_scatter implementation (#40)
Substract overflow error for decentralized op (#39)
Autotune api conflict (#131)
Autotune pytest run forever (#132)
Fix bagua.distributed.run --is_output_autotune_log parsing (#145)
Fix QADAM params (#17)
Fix assert precision (#18)
Fix torch version check (#150)

Features

Add native low precision decentralized operator (#26)
Add low precision decentralized algorithm (#103)
Add (scatter, gather, scatter_reduce) and all inplace version communication primitives (#37)
Add all communication primitives such as send recv to communication module (#128)
Make full precision decentralized op stateless (#126)
Make full precision decentralized op stateless (#36)
Add communication_primitives example (#12)
Support duplicated parameters acorss different modules (#147)
Support nccl 2.10 ReduceOp.AVG (#149)
Support nccl 2.10 ncclAvg (#45)
Use nccl 2.10 avg op for all algorithms using averaging (#46)
Add opentelemetry to report tensor ready order (#42)
Add support for reporting tensor completion order (#146)
Add deterministic flag (#15)

[0.6.3] - 2021-07-08

Bug Fixes

Autotune service defaults with a fixed random seed (#117)

Features

Improve autotune speed metrics measurement for better accuracy (#86)
Install.sh will not install rust if already exist on the system
Install.sh upgrades existing bagua
Sort q_adam variables for better performance (#102)
Better debug log including tensor info when executing ops
Support multiple models on autotune service (#107)
Support multiple models in buckets registration (#113)
Support different ssh port on different nodes (#93)

[0.6.2] - 2021-07-02

Bug Fixes

Fix QAdam gradient is not BaguaTensor during first stage

[0.6.1] - 2021-07-02

Bug Fixes

Fix alltoall_v parameter types (#27)
Fix BaguaBacket.clear_ops() return value
Always mark bagua padding tensor as ready
Fix append python op callable reference
BaguaBucket.tensors should only contain original passed in tensors

Features

Add append op methods to python BaguaBucket class (#87)
Wrap python op in communication stream context by default
Broadcast model parameters on every algorithm reset
Add QAdam algorithm (#92)

[0.6.0] - 2021-07-01

Bug Fixes

The environment variable LOCAL_SIZE has been renamed in LOCAL_WORLD_SIZE (#51)
Fix alltoall_v parameter (#17)
Reduce and allgather python interface
Fix decompress incorrect pointer and typo in error msg
Fix python gil deadlock during getting data ptr
Auto installation for centos (#66)
Fix algoirthm pre forward hook not returned
Fix benchmark script requirements

Features

Add synethetic benchmark script (#5)
Auto installation support centos (#50)
Add elastic training example (#7)
Support alltoall_v (vector alltoall) (#14)
Add reduce and allgather python interface
Support reduce and allgather op with Reduction op enum
Support reduction op and reduce
Support creating BaguaTensor by passing torch tensor directly (#19)
Compatible mode for getting pytorch tensor info with Python interpreter
Add algorithm import in bagua.torch_api
Add all algorithms import in bagua.torch_api.algorithms

[0.5.0] - 2021-06-25

Bug Fixes

Do not setup python dependencies when performing codeql check
Remove logging in load balancing dataloader to avoid deadlock (#35)
Add back user interfacing imports in init.py (#38)
Fix bucket size switch not effective (#48)

Features

Add broadcast_buffer in bagua_init (#29)
Elastic training (#31)
Add 5 min timeout for buckets' comm op (#5)
Replace NCCL with Aluminum (#7)
Add dependency installation script for ubuntu (#41)

[0.4.0] - 2021-06-17

Bug Fixes

Fix ci pypi versioning
Remove init.py and python version, use cargo version
Only run publish once on git tag
Fix baguaelastic launcher
Fix baguaelastic launch script
Fix setup.py for low version setuptools (#14)
Move import bagua_install_library to install library function
Merge bagua_install_library and setup.py, remove nccl<=2.6 support

Features

Add pytorch examples for MNIST, ImageNet, SQuAD training (#1)
Add requirements.txt, only download dataset on local rank 0 (#2)
Initial commit of bagua core impl
Add python packaging related files
Only publish pypi for master commits
Add version variable
Install nccl deps in bagua core and add generated version variable
Initial public release of bagua python code
Update interface and doc for loadbalance dataloader and add doc for fused optimizer (#17)
Add version.py placeholder to prevent file not found error
Initial support for python op (#2)
Support new python op supported backend (#26)

Files

CHANGELOG.md

Latest commit

History

CHANGELOG.md

File metadata and controls

CHANGELOG

[0.9.0] - 2022-01-13

Bug Fixes

Other

Python

Features

Core

Other

Python

[0.8.2] - 2021-11-09

Bug Fixes

Other

Python

Features

Python

Python, core

[0.8.1] - 2021-10-16

Features

Other

Python

[0.8.0] - 2021-09-26

Bug Fixes

Ci

Core

Other

Python

Features

Ci

Other

Python

[0.7.0] - 2021-08-16

Bug Fixes

Features

[0.6.3] - 2021-07-08

Bug Fixes

Features

[0.6.2] - 2021-07-02

Bug Fixes

[0.6.1] - 2021-07-02

Bug Fixes

Features

[0.6.0] - 2021-07-01

Bug Fixes

Features

[0.5.0] - 2021-06-25

Bug Fixes

Features

[0.4.0] - 2021-06-17

Bug Fixes

Features