-
Notifications
You must be signed in to change notification settings - Fork 326
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Atomic gemm and FP8 Reduce Scatter (#449)
* Initial commit Signed-off-by: Vasudevan Rengasamy <[email protected]> * Repro for RS output mismatch with Single GEMM + Split pipelined RS Signed-off-by: Vasudevan Rengasamy <[email protected]> * minor changes for AG->GEMM pipelined overlap Signed-off-by: Vasudevan Rengasamy <[email protected]> * Add Atomic Gemm cublasApi attributes and initial implementation of AG->Atomic GEMM Signed-off-by: Vasudevan Rengasamy <[email protected]> * AtomicGemm+RS functional with workaround Signed-off-by: Vasudevan Rengasamy <[email protected]> * add amax update to layernorm_linear for FP8 unit test accuracy Signed-off-by: Vasudevan Rengasamy <[email protected]> * Enable reducescatter2_userbuff_strided variants Signed-off-by: Vasudevan Rengasamy <[email protected]> * Bug fix Signed-off-by: Vasudevan Rengasamy <[email protected]> * AG+AtomicGemm overlap functional but gemm doesnt overlap with comm Signed-off-by: Vasudevan Rengasamy <[email protected]> * Add userbuffers_sendrecv kernel variants Signed-off-by: Vasudevan Rengasamy <[email protected]> * TransformerLayer API changes to enable AtomicGemm+RS overlap Signed-off-by: Vasudevan Rengasamy <[email protected]> * Code cleanup Signed-off-by: Vasudevan Rengasamy <[email protected]> * Code cleanup2 Signed-off-by: Vasudevan Rengasamy <[email protected]> * [UB] AllGather Atomic GEMM overlap using userbuffer_sendrecv kernels Signed-off-by: Vasudevan Rengasamy <[email protected]> * Code cleanup + bug fix for multiatomic sendrecv kernel Signed-off-by: Vasudevan Rengasamy <[email protected]> * cleanup Signed-off-by: Vasudevan Rengasamy <[email protected]> * Bug fixes Signed-off-by: Vasudevan Rengasamy <[email protected]> * [UB] Add shuffling for better AG AtomicGEMM overlap Signed-off-by: Vasudevan Rengasamy <[email protected]> * Bug fix for AG AtomicGemm overlap Signed-off-by: Vasudevan Rengasamy <[email protected]> * Bug fix for multiAtomicAG and singleAtomicAG Signed-off-by: Vasudevan Rengasamy <[email protected]> * Use chunk_i+1 as recv_chunk for multiatomic_AG with shuffling Signed-off-by: Vasudevan Rengasamy <[email protected]> * Launch AtomicGEMM after first-chunk AG Signed-off-by: Vasudevan Rengasamy <[email protected]> * Rebase to main Signed-off-by: Vasudevan Rengasamy <[email protected]> * Add FP8 ReduceScatter kernels, AtomicGEMM+FP8 RS not functional Signed-off-by: Vasudevan Rengasamy <[email protected]> * Revert "Add FP8 ReduceScatter kernels, AtomicGEMM+FP8 RS not functional" This reverts commit 80a47a7. Signed-off-by: Vasudevan Rengasamy <[email protected]> * Add support for NVLS-MC and FP8 Reduce Scatter Signed-off-by: Vasudevan Rengasamy <[email protected]> * Bug fix Signed-off-by: Vasudevan Rengasamy <[email protected]> * Atomic and Multiatomic FP8 RS functional Signed-off-by: Vasudevan Rengasamy <[email protected]> * Remove debug print Signed-off-by: Vasudevan Rengasamy <[email protected]> * UB comm initialization hang fix Signed-off-by: Vasudevan Rengasamy <[email protected]> * Code cleanup Signed-off-by: Vasudevan Rengasamy <[email protected]> * Create new GEMM API for Atomic GEMM Signed-off-by: Vasudevan Rengasamy <[email protected]> * CI ready Signed-off-by: Kirthi Shankar Sivamani <[email protected]> * more fixes Signed-off-by: Kirthi Shankar Sivamani <[email protected]> * license Signed-off-by: Kirthi Shankar Sivamani <[email protected]> * Bug fix Signed-off-by: Vasudevan Rengasamy <[email protected]> * Revert NVLS-MC Signed-off-by: Vasudevan Rengasamy <[email protected]> * Check cu* versions for running atomic gemms Signed-off-by: Kirthi Shankar Sivamani <[email protected]> * lint Signed-off-by: Kirthi Shankar Sivamani <[email protected]> * fixes Signed-off-by: Kirthi Shankar Sivamani <[email protected]> * Cleanup Signed-off-by: Vasudevan Rengasamy <[email protected]> * Add experimental warning Signed-off-by: Kirthi Shankar Sivamani <[email protected]> * Better wording Signed-off-by: Kirthi Shankar Sivamani <[email protected]> * Add warning to c api Signed-off-by: Kirthi Shankar Sivamani <[email protected]> * Fix wording Signed-off-by: Kirthi Shankar Sivamani <[email protected]> --------- Signed-off-by: Vasudevan Rengasamy <[email protected]> Signed-off-by: Kirthi Shankar Sivamani <[email protected]> Co-authored-by: Kirthi Shankar Sivamani <[email protected]>
- Loading branch information
1 parent
be67f21
commit 958e188
Showing
17 changed files
with
3,619 additions
and
702 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.