Releases: aws/sagemaker-training-toolkit
Releases · aws/sagemaker-training-toolkit
v4.8.1
v4.8.0
Features
- Add support for py39 and py310
Bug Fixes and Other Changes
- typo in the run unit tests command
- run unit tests in sequence order for release process as well to prevent coverage conflicting issues
- chore: removing unnecessary logging information
v4.7.4
Bug Fixes and Other Changes
- update the boto deps to use latest boto
v4.7.3
Bug Fixes and Other Changes
- bypass DNS check for studio local exec
v4.7.2
Bug Fixes and Other Changes
- use smddprun only if it is installed
v4.7.1
Bug Fixes and Other Changes
- Add NCCL_PROTO=simple environment variable to handle the out-of-order data delivery from EFA
- toolkit build failure
v4.7.0
Features
- support codeartifact for installing requirements.txt packages
v4.6.1
Bug Fixes and Other Changes
- removed unused import statment
- forgot to run black on torch_distributed.py after updating my comments from last commit
- Modified my comment on line 98-103 in torch_distrbuted.py to comply with formatting standard.
- Revert "Ran black on entire sagemaker-trianing-toolkit directory"
- Ran black on entire sagemaker-trianing-toolkit directory
- Ran Black (python formatter) on the files with my code updates (torch_distributed.py and test_torch_distributed.py)
- Added test for neuron_parallel_compile in test_torch_distributed.py
- Updated comment syntax based on feedback in pull request as well as added full example of the neuron_parallel_compile command as it would appear in the command line
- added unit test for neuron_parallel_compile code change
- Updated torch_distributed.py
v4.6.0
Features
- add smddp exception classes in mpi distribution
v4.5.0
Features
- add NCCL_PROTO, NCCL_ALGO environments for modelparallel jobs