Enable msccl capability (microsoft#3) (microsoft#4)

* initial checkin * add submodule nccl-tests and update the readme * Update README with MSCCL scheduler * update submodule to latest --------- Co-authored-by: Ziyue Yang <[email protected]> Co-authored-by: root <root@liand-h100-validation-vmss000002.wxea2wklo2jenp1trbnjn0dkpb.jx.internal.cloudapp.net>
yzygitzh · Jul 10, 2023 · 2c98bd4 · 2c98bd4
1 parent 56b1667
commit 2c98bd4
Show file tree

Hide file tree

Showing 5 changed files with 110 additions and 29 deletions.
diff --git a/.gitmodules b/.gitmodules
@@ -0,0 +1,8 @@
+[submodule "executor/msccl-executor-nccl"]
+	path = executor/msccl-executor-nccl
+	url = https://github.com/Azure/msccl-executor-nccl.git
+	branch = main
+[submodule "tests/msccl-tests-nccl"]
+	path = tests/msccl-tests-nccl
+	url = https://github.com/Azure/msccl-tests-nccl.git
+	branch = main
diff --git a/README.md b/README.md
@@ -1,20 +1,100 @@
-# Project
+# MSCCL
 
-> This repo has been populated by an initial template to help get you started. Please
-> make sure to update the content to build a great experience for community-building.
+Microsoft Collective Communication Library (MSCCL) is a platform to execute custom collective communication algorithms for multiple accelerators supported by Microsoft Azure. The research prototype of this project is [microsoft/msccl](https://github.com/microsoft/msccl).
 
-As the maintainer of this project, please make a few updates:
+## Introduction
 
-- Improving this README.MD file to provide a great experience
-- Updating SUPPORT.MD with content about this project's support experience
-- Understanding the security reporting process in SECURITY.MD
-- Remove this section from the README
+MSCCL vision is to provide a unified, efficient, and scalable framework for executing collective communication algorithms across multiple accelerators. To achieve this, MSCCL has multiple components:
+
+- [MSCCL toolkit](https://github.com/microsoft/msccl-tools): Inter-connection among accelerators have different latencies and bandwidths. Therefore, a generic collective communication algorithm does not necessarily well for all topologies and buffer sizes. In order to provide the flexibility, we provide the MSCCL toolkit, which allows a user to write a hyper-optimized collective communication algorithm for a given topology and a buffer size. MSCCL toolkit contains a high-level DSL (MSCCLang) and a compiler which generate an IR for the MSCCL executor([msccl-executor-nccl](https://github.com/Azure/msccl-executor-nccl)) to run on the backend. [Example](#Example) provides some instances on how MSCCL toolkit with the runtime works. Please refer to [MSCCL toolkit](https://github.com/microsoft/msccl-tools) for more information.
+
+- [MSCCL scheduler](https://github.com/microsoft/msccl-scheduler): MSCCL scheduler provides an example design and implementation of how to select optimal MSCCL algorithms for MSCCL executors.
+
+- MSCCL executor([msccl-executor-nccl](https://github.com/Azure/msccl-executor-nccl)): msccl-executor-nccl is an inter-accelerator communication framework that is built on top of [NCCL](https://github.com/nvidia/nccl) and uses its building blocks to execute custom-written collective communication algorithms.
+
+- MSCCL test toolkit([msccl-tests-nccl](https://github.com/Azure/msccl-tests-nccl)): These tests check both the performance and the correctness of MSCCL operations.
+
+## Example
+
+In order to use MSCCL, you may follow these steps to use two different MSCCL algorithms for AllReduce on Azure NDv4 which has 8xA100 GPUs:
+
+Follow below steps to download the source code of msccl and related submodules
+
+```sh
+$ git clone https://github.com/microsoft/msccl.git --recurse-submodules
+```
+
+Steps to install MSCCL executor:
+
+```sh
+$ git clone https://github.com/microsoft/msccl.git --recurse-submodules
+$ cd msccl/executor/msccl-executor-nccl
+$ make -j src.build
+$ cd ../
+$ cd ../
+```
+
+Then, follow these steps to install msccl-tests-nccl for performance evaluation:
+
+```sh
+$ cd tests/msccl-tests-nccl/
+$ make MPI=1 NCCL_HOME=$HOME/msccl/executor/msccl-executor-nccl/build/ -j
+$ cd ../
+$ cd ../
+```
+
+Next install [MSCCL toolkit](https://github.com/microsoft/msccl-tools) to compile a few custom algorithms:
+
+```sh
+$ git clone https://github.com/microsoft/msccl-tools.git
+$ cd msccl-tools/
+$ pip install .
+$ cd ../
+$ python msccl-tools/examples/mscclang/allreduce_a100_allpairs.py --protocol=LL 8 2 > test.xml
+$ cd ../
+```
+
+The compiler's generated code is an XML file (`test.xml`) that is fed to MSCCL runtime. To evaluate its performance, copy the `test.xml` to the msccl/exector/msccl-executor-nccl/build/lib/msccl-algorithms/ and execute the following command line on an Azure NDv4 node or any 8xA100 system:
+
+```sh
+$ mpirun -np 8 -x LD_LIBRARY_PATH=msccl/exector/msccl-executor-nccl/build/lib/:$LD_LIBRARY_PATH -x NCCL_DEBUG=INFO -x NCCL_DEBUG_SUBSYS=INIT,ENV tests/msccl-tests-nccl/build/all_reduce_perf -b 128 -e 32MB -f 2 -g 1 -c 1 -n 100 -w 100 -G 100 -z 0
+```
+
+If everything is installed correctly, you should see the following output in log:
+
+```sh
+[0] NCCL INFO Connected 1 MSCCL algorithms
+```
+
+You may evaluate the performance of `test.xml` by comparing in-place (the new algorithm) vs out-of-place (default ring algorithm) and it should up-to 2-3x faster on 8xA100 NVLink-interconnected GPUs. [MSCCL toolkit](https://github.com/microsoft/msccl-tools) has a rich set of algorithms for different Azure SKUs and collective operations with significant speedups over vanilla NCCL.
+
+## Build
+
+To build the library:
+
+```sh
+$ cd msccl/exector/msccl-executor-nccl
+$ make -j src.build
+```
+
+If CUDA is not installed in the default /usr/local/cuda path, you can define the CUDA path with :
+
+```sh
+$ make src.build CUDA_HOME=<path to cuda install>
+```
+
+MSCCL will be compiled and installed in `build/` unless `BUILDDIR` is set.
+
+By default, MSCCL is compiled for all supported architectures. To accelerate the compilation and reduce the binary size, consider redefining `NVCC_GENCODE` (defined in `makefiles/common.mk`) to only include the architecture of the target platform :
+```sh
+$ make -j src.build NVCC_GENCODE="-gencode=arch=compute_80,code=sm_80"
+```
 
 ## Contributing
 
 This project welcomes contributions and suggestions.  Most contributions require you to agree to a
 Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us
-the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.
+the rights to use your contribution. For details, visit [CLA](https://cla.opensource.microsoft.com).
 
 When you submit a pull request, a CLA bot will automatically determine whether you need to provide
 a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions
@@ -26,8 +106,8 @@ contact [[email protected]](mailto:[email protected]) with any additio
 
 ## Trademarks
 
-This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft 
-trademarks or logos is subject to and must follow 
+This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft
+trademarks or logos is subject to and must follow
 [Microsoft's Trademark & Brand Guidelines](https://www.microsoft.com/en-us/legal/intellectualproperty/trademarks/usage/general).
 Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship.
 Any use of third-party trademarks or logos are subject to those third-party's policies.
diff --git a/SUPPORT.md b/SUPPORT.md
@@ -1,25 +1,16 @@
-# TODO: The maintainer of this repo has not yet edited this file
-
-**REPO OWNER**: Do you want Customer Service & Support (CSS) support for this product/project?
-
-- **No CSS support:** Fill out this template with information about how to file issues and get help.
-- **Yes CSS support:** Fill out an intake form at [aka.ms/onboardsupport](https://aka.ms/onboardsupport). CSS will work with/help you to determine next steps.
-- **Not sure?** Fill out an intake as though the answer were "Yes". CSS will help you decide.
-
-*Then remove this first heading from this SUPPORT.MD file before publishing your repo.*
-
 # Support
 
-## How to file issues and get help  
+## How to file issues and get help
 
-This project uses GitHub Issues to track bugs and feature requests. Please search the existing 
-issues before filing new issues to avoid duplicates.  For new issues, file your bug or 
-feature request as a new Issue.
+This project uses [GitHub Issues] to track bugs and feature requests. Please search the existing
+issues before filing new issues to avoid duplicates. For new issues, file your bug or
+feature request as a new issue.
 
-For help and questions about using this project, please **REPO MAINTAINER: INSERT INSTRUCTIONS HERE 
-FOR HOW TO ENGAGE REPO OWNERS OR COMMUNITY FOR HELP. COULD BE A STACK OVERFLOW TAG OR OTHER
-CHANNEL. WHERE WILL YOU HELP PEOPLE?**.
+For help and questions about using this project, please create a new post in [GitHub Discussions].
 
-## Microsoft Support Policy  
+## Microsoft Support Policy
 
 Support for this **PROJECT or PRODUCT** is limited to the resources listed above.
+
+[GitHub Issues]: https://github.com/Azure/msccl/issues
+[GitHub Discussions]: https://github.com/Azure/msccl/discussions
diff --git a/executor/msccl-executor-nccl b/executor/msccl-executor-nccl
diff --git a/tests/msccl-tests-nccl b/tests/msccl-tests-nccl