Skip to content

xrennvidia/collective_matmul

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

collective_matmul

This unit test composes two back-to-back GEMM layers (FC1 and FC2 of LLM MLP). FC1 does AG+GEMM, and FC2 does GEMM+RS.

Running examples

175B config

python collective_matmul.py --dp 2 --tp 4

You can change dp (Data Parallel) and tp (Tensor Model Parallel) by simply giving differen numbre to above commandline.

To run baseline (i.e., no overlapping), add --no_tp_overlap in the commandline.

5B config

python collective_matmul.py --batch_size 4 --hidden_size 4096

DP, TP, and overlapping arguments are configured in the same way as 175B.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages