Skip to content

Latest commit

 

History

History
334 lines (246 loc) · 10.5 KB

CHANGELOG.md

File metadata and controls

334 lines (246 loc) · 10.5 KB

CHANGELOG

[0.9.0] - 2022-01-13

Bug Fixes

Other

  • Reuse fused parameter tensors in fuse_step (#410)
  • Call step closure in qadam optimizer step (#432)
  • Fix need_reset condition (#454)
  • Do negotiation in async native op (#447)
  • Fix find_unused_parameters (#452)
  • Fix qadam non-deterministic (#459)
  • Add LIBRARY_PATH env in install_master.sh (#465)
  • Fix typo in install_master.sh (#471)

Python

  • CUDA 11.5 can't get nccl package (#415)
  • Fix process group compatibility with torch 1.6.0 (#413)
  • Fix ci random fail (#445)
  • Fix async algorithm (#479)

Features

Core

  • Initial support for C interface (#325)

Other

  • Support NODE_RANK environment variable (#426)
  • Choose bagua service port dynamically (#431)
  • Use bagua_module_name to identify different modules (#438)
  • Add algorithm registry (#433)
  • Add compatibility for NCCL version under 2.10 (#449)
  • Add broadcast object api (#437)
  • Support qadam in fused optimizer (#477)

Python

  • Support PyTorch DDP compatible distributed training API (#312)
  • Support torch-api-compatiable all_reduce (#377)
  • Associate PyTorch Process Group with Bagua Process Group using cache (#402)
  • Support find_unused_parameters on BaguaDDP (#409)
  • Add BAGUA_AUTOTUNE_SERVER_WAIT_TIME env (#474)

[0.8.2] - 2021-11-09

Bug Fixes

Other

  • Fuse optimizer oom and make it stateless (#207)
  • To_bagua_tensor compatibility with torch 1.6.0 (#355)

Python

  • Use separate process group for async communication thread to avoid potential hangs (#298)
  • Do not fail if checkpoints path exist (#305)
  • Fix is_moe_param (#306)
  • Change to_bagua_tensor API to support PyTorch 1.10 (#338)
  • Fix fused optimizer with multiple param groups (#356)

Features

Python

  • Support switching between different algorithms (#299)
  • Separate algorithm declaration and implementation (#246)

Python, core

  • Support process group in with_bagua, support hierarchical communication in bytegrad algorithm (#300)
  • Support mutable bucket tensors (#271)
  • Support all_to_all_single (#361)

[0.8.1] - 2021-10-16

Features

Other

  • Use single bucket for decentralized algorithm to improve performance (#275)
  • Support process group (#228)
  • Add barrier api (#290)

Python

  • Support moe (#208)
  • Support checkpointing for moe (#242)

[0.8.0] - 2021-09-26

Bug Fixes

Ci

  • Only run publish once on git tag

Core

  • Fix compressed buffer can not be scattered to odd number of ranks

Other

  • Fix ci pypi versioning
  • Remove init.py and python version, use cargo version
  • Move import bagua_install_library to install library function
  • Merge bagua_install_library and setup.py, remove nccl<=2.6 support
  • Fix alltoall_v parameter (#17)
  • Reduce and allgather python interface
  • Fix decompress incorrect pointer and typo in error msg
  • Fix python gil deadlock during getting data ptr
  • Fix benchmark script requirements
  • Fix alltoall_v parameter types (#27)
  • Always mark bagua padding tensor as ready
  • Make compress/decompress of BaguaTensor method string consistent (#33)
  • Fix scatter and reduce_scatter implementation (#40)
  • Substract overflow error for decentralized op (#39)
  • Fix QADAM params (#17)
  • Fix assert precision (#18)
  • Replace mutex with atomic bool for async op and add Aluminum submodule update (#67)
  • Fix duplicated dependency downloading during installation (#77)
  • Fix async algorithm aborting and hanging (#78, #81)
  • Fix qadam algorithm call (#20)
  • Fix missing symbols in the zip library (#24)
  • Fix random autotune server hang (#206)
  • Bagua-net library path mismatch, make --enable_bagua_net argument style consistent with other args (#218)

Python

  • Fix random autotune-service hang
  • Handle conflicts caused by sklearn upgrade (#225)

Features

Ci

  • Only publish pypi for master commits

Other

  • Add async model average algorithm (#110)
  • Add cached dataset wrapper (#148)
  • Support sync batchnorm (#151)
  • Add --enable-bagua-net option in launcher (#183)
  • Add pytorch examples for MNIST, ImageNet, SQuAD training (#1)
  • Add requirements.txt, only download dataset on local rank 0 (#2)
  • Add python packaging related files
  • Add __version__ variable
  • Install nccl deps in bagua core and add generated __version__ variable
  • Add version.py placeholder to prevent file not found error
  • Initial support for python op (#2)
  • Add 5 min timeout for buckets' comm op (#5)
  • Replace NCCL with Aluminum (#7)
  • Add synethetic benchmark script (#5)
  • Add elastic training example (#7)
  • Support alltoall_v (vector alltoall) (#14)
  • Add reduce and allgather python interface
  • Support reduce and allgather op with Reduction op enum
  • Support creating BaguaTensor by passing torch tensor directly (#19)
  • Compatible mode for getting pytorch tensor info with Python interpreter
  • Better debug log including tensor info when executing ops
  • Add native low precision decentralized operator (#26)
  • Add (scatter, gather, scatter_reduce) and all inplace version communication primitives (#37)
  • Make full precision decentralized op stateless (#36)
  • Add communication_primitives example (#12)
  • Use nccl 2.10 avg op for all algorithms using averaging (#46, #45)
  • Add opentelemetry to report tensor ready order (#42)
  • Add deterministic flag (#15)
  • Add native async model average algorithm (#41)
  • Add examples for async model average algorithm (#14)
  • Support packet splitting and multi-stream parallel transmission (#5)
  • Support ncclnet v3 and remove the dependency on nccl in the installation environment (#17)
  • Add sync interval param to async examples (#19)
  • Suppport tokio backend (#21)
  • Support bagua-net (#89)

Python

  • Broadcast scalars for optimizers (#202)

[0.7.0] - 2021-08-16

Bug Fixes

  • Make compress/decompress of BaguaTensor method string consistent (#33)
  • Fix scatter and reduce_scatter implementation (#40)
  • Substract overflow error for decentralized op (#39)
  • Autotune api conflict (#131)
  • Autotune pytest run forever (#132)
  • Fix bagua.distributed.run --is_output_autotune_log parsing (#145)
  • Fix QADAM params (#17)
  • Fix assert precision (#18)
  • Fix torch version check (#150)

Features

  • Add native low precision decentralized operator (#26)
  • Add low precision decentralized algorithm (#103)
  • Add (scatter, gather, scatter_reduce) and all inplace version communication primitives (#37)
  • Add all communication primitives such as send recv to communication module (#128)
  • Make full precision decentralized op stateless (#126)
  • Make full precision decentralized op stateless (#36)
  • Add communication_primitives example (#12)
  • Support duplicated parameters acorss different modules (#147)
  • Support nccl 2.10 ReduceOp.AVG (#149)
  • Support nccl 2.10 ncclAvg (#45)
  • Use nccl 2.10 avg op for all algorithms using averaging (#46)
  • Add opentelemetry to report tensor ready order (#42)
  • Add support for reporting tensor completion order (#146)
  • Add deterministic flag (#15)

[0.6.3] - 2021-07-08

Bug Fixes

  • Autotune service defaults with a fixed random seed (#117)

Features

  • Improve autotune speed metrics measurement for better accuracy (#86)
  • Install.sh will not install rust if already exist on the system
  • Install.sh upgrades existing bagua
  • Sort q_adam variables for better performance (#102)
  • Better debug log including tensor info when executing ops
  • Support multiple models on autotune service (#107)
  • Support multiple models in buckets registration (#113)
  • Support different ssh port on different nodes (#93)

[0.6.2] - 2021-07-02

Bug Fixes

  • Fix QAdam gradient is not BaguaTensor during first stage

[0.6.1] - 2021-07-02

Bug Fixes

  • Fix alltoall_v parameter types (#27)
  • Fix BaguaBacket.clear_ops() return value
  • Always mark bagua padding tensor as ready
  • Fix append python op callable reference
  • BaguaBucket.tensors should only contain original passed in tensors

Features

  • Add append op methods to python BaguaBucket class (#87)
  • Wrap python op in communication stream context by default
  • Broadcast model parameters on every algorithm reset
  • Add QAdam algorithm (#92)

[0.6.0] - 2021-07-01

Bug Fixes

  • The environment variable LOCAL_SIZE has been renamed in LOCAL_WORLD_SIZE (#51)
  • Fix alltoall_v parameter (#17)
  • Reduce and allgather python interface
  • Fix decompress incorrect pointer and typo in error msg
  • Fix python gil deadlock during getting data ptr
  • Auto installation for centos (#66)
  • Fix algoirthm pre forward hook not returned
  • Fix benchmark script requirements

Features

  • Add synethetic benchmark script (#5)
  • Auto installation support centos (#50)
  • Add elastic training example (#7)
  • Support alltoall_v (vector alltoall) (#14)
  • Add reduce and allgather python interface
  • Support reduce and allgather op with Reduction op enum
  • Support reduction op and reduce
  • Support creating BaguaTensor by passing torch tensor directly (#19)
  • Compatible mode for getting pytorch tensor info with Python interpreter
  • Add algorithm import in bagua.torch_api
  • Add all algorithms import in bagua.torch_api.algorithms

[0.5.0] - 2021-06-25

Bug Fixes

  • Do not setup python dependencies when performing codeql check
  • Remove logging in load balancing dataloader to avoid deadlock (#35)
  • Add back user interfacing imports in init.py (#38)
  • Fix bucket size switch not effective (#48)

Features

  • Add broadcast_buffer in bagua_init (#29)
  • Elastic training (#31)
  • Add 5 min timeout for buckets' comm op (#5)
  • Replace NCCL with Aluminum (#7)
  • Add dependency installation script for ubuntu (#41)

[0.4.0] - 2021-06-17

Bug Fixes

  • Fix ci pypi versioning
  • Remove init.py and python version, use cargo version
  • Only run publish once on git tag
  • Fix baguaelastic launcher
  • Fix baguaelastic launch script
  • Fix setup.py for low version setuptools (#14)
  • Move import bagua_install_library to install library function
  • Merge bagua_install_library and setup.py, remove nccl<=2.6 support

Features

  • Add pytorch examples for MNIST, ImageNet, SQuAD training (#1)
  • Add requirements.txt, only download dataset on local rank 0 (#2)
  • Initial commit of bagua core impl
  • Add python packaging related files
  • Only publish pypi for master commits
  • Add version variable
  • Install nccl deps in bagua core and add generated version variable
  • Initial public release of bagua python code
  • Update interface and doc for loadbalance dataloader and add doc for fused optimizer (#17)
  • Add version.py placeholder to prevent file not found error
  • Initial support for python op (#2)
  • Support new python op supported backend (#26)