Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[metax]update main page #448

Merged
merged 41 commits into from
Feb 7, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
41 commits
Select commit Hold shift + click to select a range
fa1ecec
update readme
Jan 21, 2024
319e835
add company info
Jan 21, 2024
ac71827
Merge branch 'FlagOpen:main' into main
fred1912 Jan 22, 2024
3a0a166
faster_rcnn update & first PR
Jan 22, 2024
3ce374f
fix readme
Jan 22, 2024
51aabdf
add config 1x8 bs=16
Jan 22, 2024
8fc458b
fix typo A100->C500
Jan 22, 2024
1dd7474
remove torchvision in requirements.txt
Jan 22, 2024
0f91316
update readme
Jan 22, 2024
ae2292f
update fasterrcnn readme
Jan 22, 2024
02cbdf8
update
Jan 22, 2024
167335a
add 2x8 info & add 带宽
Jan 24, 2024
f8169c9
fix typo
Jan 24, 2024
b134c33
delete history
Jan 24, 2024
dcd97f6
update info
Jan 25, 2024
a1e2df9
update info
Jan 25, 2024
feceef0
update table
Jan 25, 2024
1540bc6
delete history
Jan 25, 2024
a726224
add info in test-conf
Jan 25, 2024
28d40f5
fix typo
Jan 25, 2024
bf4641c
delete history
Jan 25, 2024
2070c1e
fix env bug & add mx tf32 env
Jan 25, 2024
66887d5
update requirements
Jan 25, 2024
687edc7
fix bug
Jan 25, 2024
c90ba5b
fix bug
Jan 25, 2024
d3e73f6
fix bug
Jan 25, 2024
ebd3a53
Merge branch 'FlagOpen:main' into main
fred1912 Jan 25, 2024
bbd79e4
fix bug
Jan 25, 2024
ef15aee
update file path
Jan 25, 2024
20b609e
Merge branch 'FlagOpen:main' into main
fred1912 Jan 27, 2024
885321c
Merge branch 'FlagOpen:main' into main
fred1912 Jan 30, 2024
9bbaa12
support torch2.0 and torchvision>0.12
Jan 30, 2024
914b9db
delete history
Jan 30, 2024
d1c763d
Merge branch 'FlagOpen:main' into main
fred1912 Jan 31, 2024
ef03941
add mext seed related
Jan 31, 2024
f339d5d
Merge branch 'FlagOpen:main' into main
fred1912 Feb 7, 2024
8e56fd2
update main page
Feb 7, 2024
ab94d42
replate 5 files
Feb 7, 2024
d27cf9f
ur
shh2000 Feb 7, 2024
f8891e1
ur
shh2000 Feb 7, 2024
099d9a6
Merge pull request #1 from shh2000/ur0207
fred1912 Feb 7, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
73 changes: 43 additions & 30 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -119,6 +119,16 @@
</tr>
<tr>
<td>6</td>
<td><a href="https://github.com/FlagOpen/FlagPerf/tree/main/training/benchmarks/chatglm3_6b">chatglm3_6b</a></td>
<td>LLM</td>
<td><a href="https://github.com/FlagOpen/FlagPerf/tree/main/training/nvidia/chatglm3_6b-deepspeed">deepspeed</a></td>
<td>N/A</td>
<td>N/A</td>
<td>N/A</td>
<td>N/A</td>
</tr>
<tr>
<td>7</td>
<td><a href="https://github.com/FlagOpen/FlagPerf/tree/main/training/benchmarks/cpm">cpm</a></td>
<td>LLM</td>
<td><a href="https://github.com/FlagOpen/FlagPerf/tree/main/training/nvidia/cpm-pytorch">pytorch</a></td>
Expand All @@ -128,7 +138,7 @@
<td>N/A</td>
</tr>
<tr>
<td>7</td>
<td>8</td>
<td><a href="https://github.com/FlagOpen/FlagPerf/tree/main/training/benchmarks/detr">detr</a></td>
<td>CV</td>
<td><a href="https://github.com/FlagOpen/FlagPerf/tree/main/training/nvidia/detr-pytorch">pytorch</a></td>
Expand All @@ -138,7 +148,7 @@
<td>N/A</td>
</tr>
<tr>
<td>8</td>
<td>9</td>
<td><a href="https://github.com/FlagOpen/FlagPerf/tree/main/training/benchmarks/distilbert">distilbert</a></td>
<td>NLP</td>
<td><a href="https://github.com/FlagOpen/FlagPerf/tree/main/training/nvidia/distilbert-pytorch">pytorch</a></td>
Expand All @@ -148,7 +158,7 @@
<td>N/A</td>
</tr>
<tr>
<td>9</td>
<td>10</td>
<td><a href="https://github.com/FlagOpen/FlagPerf/tree/main/training/benchmarks/dlrmt">DLRM</a></td>
<td>RS</td>
<td><a href="https://github.com/FlagOpen/FlagPerf/tree/main/training/nvidia/dlrm-pytorch">pytorch</a></td>
Expand All @@ -158,7 +168,7 @@
<td>N/A</td>
</tr>
<tr>
<td>10</td>
<td>11</td>
<td><a href="https://github.com/FlagOpen/FlagPerf/tree/main/training/benchmarks/efficientnet">efficientnet</a></td>
<td>CV</td>
<td><a href="https://github.com/FlagOpen/FlagPerf/tree/main/training/nvidia/efficientnet-pytorch">pytorch</a></td>
Expand All @@ -168,7 +178,7 @@
<td>N/A</td>
</tr>
<tr>
<td>11</td>
<td>12</td>
<td><a href="https://github.com/FlagOpen/FlagPerf/tree/main/training/benchmarks/faster_rcnn">faster_rcnn</a></td>
<td>CV</td>
<td><a href="https://github.com/FlagOpen/FlagPerf/tree/main/training/nvidia/faster_rcnn-pytorch">pytorch</a></td>
Expand All @@ -178,7 +188,7 @@
<td><a href="https://github.com/FlagOpen/FlagPerf/tree/main/training/metax/faster_rcnn-pytorch">pytorch</a></td>
</tr>
<tr>
<td>12</td>
<td>13</td>
<td><a href="https://github.com/FlagOpen/FlagPerf/tree/main/training/benchmarks/glm">glm</a></td>
<td>LLM</td>
<td><a href="https://github.com/FlagOpen/FlagPerf/tree/main/training/nvidia/glm-pytorch">pytorch</a></td>
Expand All @@ -188,7 +198,7 @@
<td>N/A</td>
</tr>
<tr>
<td>13</td>
<td>14</td>
<td><a href="https://github.com/FlagOpen/FlagPerf/tree/main/training/benchmarks/gpt2">gpt2</a></td>
<td>LLM</td>
<td><a href="https://github.com/FlagOpen/FlagPerf/tree/main/training/nvidia/gpt2-pytorch">pytorch</a></td>
Expand All @@ -198,7 +208,7 @@
<td>N/A</td>
</tr>
<tr>
<td>14</td>
<td>15</td>
<td><a href="https://github.com/FlagOpen/FlagPerf/tree/main/training/benchmarks/gpt3_13B">gpt3_13B</a></td>
<td>LLM</td>
<td><a href="https://github.com/FlagOpen/FlagPerf/tree/main/training/nvidia/gpt3_13B-paddle">paddle</a></td>
Expand All @@ -208,7 +218,7 @@
<td>N/A</td>
</tr>
<tr>
<td>15</td>
<td>16</td>
<td><a href="https://github.com/FlagOpen/FlagPerf/tree/main/training/benchmarks/gpt3_6.7B">gpt3_6.7B</a></td>
<td>LLM</td>
<td><a href="https://github.com/FlagOpen/FlagPerf/tree/main/training/nvidia/gpt3_6.7B-paddle">paddle</a></td>
Expand All @@ -218,7 +228,7 @@
<td>N/A</td>
</tr>
<tr>
<td>16</td>
<td>17</td>
<td><a href="https://github.com/FlagOpen/FlagPerf/tree/main/training/benchmarks/llama1_13B">llama1_13B</a></td>
<td>LLM</td>
<td><a href="https://github.com/FlagOpen/FlagPerf/tree/main/training/nvidia/llama1_13B-paddle">paddle</a></td>
Expand All @@ -228,7 +238,7 @@
<td>N/A</td>
</tr>
<tr>
<td>17</td>
<td>18</td>
<td><a href="https://github.com/FlagOpen/FlagPerf/tree/main/training/benchmarks/llama1_7B">llama1_7B</a></td>
<td>LLM</td>
<td><a href="https://github.com/FlagOpen/FlagPerf/tree/main/training/nvidia/llama1_7B-paddle">paddle</a></td>
Expand All @@ -238,17 +248,17 @@
<td>N/A</td>
</tr>
<tr>
<td>18</td>
<td>19</td>
<td><a href="https://github.com/FlagOpen/FlagPerf/tree/main/training/benchmarks/llama2_7b">llama2_7b</a></td>
<td>LLM</td>
<td><a href="https://github.com/FlagOpen/FlagPerf/tree/main/training/nvidia/llama2_7b-deepspeed">deepspeed</a></td>
<td><a href="https://github.com/FlagOpen/FlagPerf/tree/main/training/nvidia/llama2_7b-deepspeed">deepspeed</a>,<a href="https://github.com/FlagOpen/FlagPerf/tree/main/training/nvidia/llama2_7b-megatron-deepspeed">megatron-deepspeed</a></td>
<td><a href="https://github.com/FlagOpen/FlagPerf/pull/348">deepspeed</a></td>
<td><a href="https://github.com/FlagOpen/FlagPerf/pull/343">deepspeed</a></td>
<td><a href="https://github.com/FlagOpen/FlagPerf/pull/354">deepspeed</a></td>
<td>N/A</td>
<td><a href="https://github.com/FlagOpen/FlagPerf/tree/main/training/metax/llama2_7b-megatron-deepspeed">megatron-deepspeed</a></td>
</tr>
<tr>
<td>19</td>
<td>20</td>
<td><a href="https://github.com/FlagOpen/FlagPerf/tree/main/training/benchmarks/llama2_7b_finetune">llama2_7b_finetune</a></td>
<td>LLM</td>
<td><a href="https://github.com/FlagOpen/FlagPerf/tree/main/training/nvidia/llama2_7b_finetune-pytorch">pytorch</a></td>
Expand All @@ -258,17 +268,17 @@
<td>N/A</td>
</tr>
<tr>
<td>20</td>
<td>21</td>
<td><a href="https://github.com/FlagOpen/FlagPerf/tree/main/training/benchmarks/llama2_70B">llama2_70b</a></td>
<td>LLM</td>
<td><a href="https://github.com/FlagOpen/FlagPerf/tree/main/training/nvidia/llama2_70B-megatron">megatron</a></td>
<td>N/A</td>
<td>N/A</td>
<td>N/A</td>
<td>N/A</td>
<td><a href="https://github.com/FlagOpen/FlagPerf/tree/main/training/metax/llama2_70B-megatron">megatron</a></td>
</tr>
<tr>
<td>21</td>
<td>22</td>
<td><a href="https://github.com/FlagOpen/FlagPerf/tree/main/training/benchmarks/longformer">longformer</a></td>
<td>NLP</td>
<td><a href="https://github.com/FlagOpen/FlagPerf/tree/main/training/nvidia/longformer-pytorch">pytorch</a></td>
Expand All @@ -278,7 +288,7 @@
<td>N/A</td>
</tr>
<tr>
<td>22</td>
<td>23</td>
<td><a href="https://github.com/FlagOpen/FlagPerf/tree/main/training/benchmarks/mask_rcnn">mask_rcnn</a></td>
<td>CV</td>
<td><a href="https://github.com/FlagOpen/FlagPerf/tree/main/training/nvidia/mask_rcnn-pytorch">pytorch</a></td>
Expand All @@ -288,7 +298,7 @@
<td>N/A</td>
</tr>
<tr>
<td>23</td>
<td>24</td>
<td><a href="https://github.com/FlagOpen/FlagPerf/tree/main/training/benchmarks/mobilenetv2">mobilenetv2</a></td>
<td>CV</td>
<td><a href="https://github.com/FlagOpen/FlagPerf/tree/main/training/nvidia/mobilenetv2-pytorch">pytorch</a></td>
Expand All @@ -298,7 +308,7 @@
<td>N/A</td>
</tr>
<tr>
<td>24</td>
<td>25</td>
<td><a href="https://github.com/FlagOpen/FlagPerf/tree/main/training/benchmarks/resnet50">resnet50</a></td>
<td>CV</td>
<td><a href="https://github.com/FlagOpen/FlagPerf/tree/main/training/nvidia/resnet50-pytorch">pytorch</a>, <a href="https://github.com/FlagOpen/FlagPerf/tree/main/training/nvidia/resnet50-tensorflow2">tensorflow2</a></td>
Expand All @@ -308,7 +318,7 @@
<td><a href="https://github.com/FlagOpen/FlagPerf/tree/main/training/metax/resnet50-pytorch">pytorch</a></td>
</tr>
<tr>
<td>25</td>
<td>26</td>
<td><a href="https://github.com/FlagOpen/FlagPerf/tree/main/training/benchmarks/retinanet">retinanet</a></td>
<td>CV</td>
<td><a href="https://github.com/FlagOpen/FlagPerf/tree/main/training/nvidia/retinanet-pytorch">pytorch</a></td>
Expand All @@ -318,7 +328,7 @@
<td><a href="https://github.com/FlagOpen/FlagPerf/tree/main/training/metax/retinanet-pytorch">pytorch</a></td>
</tr>
<tr>
<td>26</td>
<td>27</td>
<td><a href="https://github.com/FlagOpen/FlagPerf/tree/main/training/benchmarks/swin_transformer">swin_transformer</a></td>
<td>CV</td>
<td><a href="https://github.com/FlagOpen/FlagPerf/tree/main/training/nvidia/swin_transformer-pytorch">pytorch</a></td>
Expand All @@ -328,7 +338,7 @@
<td><a href="https://github.com/FlagOpen/FlagPerf/tree/main/training/metax/swin_transformer-pytorch">pytorch</a></td>
</tr>
<tr>
<td>27</td>
<td>28</td>
<td><a href="https://github.com/FlagOpen/FlagPerf/tree/main/training/benchmarks/t5_small">t5_small</a></td>
<td>NLP</td>
<td><a href="https://github.com/FlagOpen/FlagPerf/tree/main/training/nvidia/t5_small-pytorch">pytorch</a></td>
Expand All @@ -338,7 +348,7 @@
<td><a href="https://github.com/FlagOpen/FlagPerf/tree/main/training/metax/t5_small-pytorch">pytorch</a></td>
</tr>
<tr>
<td>28</td>
<td>29</td>
<td><a href="https://github.com/FlagOpen/FlagPerf/tree/main/training/benchmarks/tacotron2">tacotron2</a></td>
<td>Audio</td>
<td><a href="https://github.com/FlagOpen/FlagPerf/tree/main/training/nvidia/tacotron2-pytorch">pytorch</a></td>
Expand All @@ -348,7 +358,7 @@
<td>N/A</td>
</tr>
<tr>
<td>29</td>
<td>30</td>
<td><a href="https://github.com/FlagOpen/FlagPerf/tree/main/training/benchmarks/transformer">transformer</a></td>
<td>NLP</td>
<td><a href="https://github.com/FlagOpen/FlagPerf/tree/main/training/nvidia/transformer-pytorch">pytorch</a></td>
Expand All @@ -358,7 +368,7 @@
<td>N/A</td>
</tr>
<tr>
<td>30</td>
<td>31</td>
<td><a href="https://github.com/FlagOpen/FlagPerf/tree/main/training/benchmarks/transformer_xl">transformer_xl</a></td>
<td>NLP</td>
<td><a href="https://github.com/FlagOpen/FlagPerf/tree/main/training/nvidia/transformer_xl-pytorch">pytorch</a></td>
Expand All @@ -368,7 +378,7 @@
<td><a href="https://github.com/FlagOpen/FlagPerf/tree/main/training/metax/transformer_xl-pytorch">pytorch</a></td>
</tr>
<tr>
<td>31</td>
<td>32</td>
<td><a href="https://github.com/FlagOpen/FlagPerf/tree/main/training/benchmarks/vit">vit</a></td>
<td>CV</td>
<td><a href="https://github.com/FlagOpen/FlagPerf/tree/main/training/nvidia/vit-pytorch">pytorch</a></td>
Expand All @@ -378,7 +388,7 @@
<td>N/A</td>
</tr>
<tr>
<td>32</td>
<td>33</td>
<td><a href="https://github.com/FlagOpen/FlagPerf/tree/main/training/benchmarks/wav2vec2">wav2vec2</a></td>
<td>Audio</td>
<td><a href="https://github.com/FlagOpen/FlagPerf/tree/main/training/nvidia/wav2vec2-pytorch">pytorch</a></td>
Expand All @@ -388,7 +398,7 @@
<td>N/A</td>
</tr>
<tr>
<td>33</td>
<td>34</td>
<td><a href="https://github.com/FlagOpen/FlagPerf/tree/main/training/benchmarks/WaveGlow">WaveGlow</a></td>
<td>Audio</td>
<td><a href="https://github.com/FlagOpen/FlagPerf/tree/main/training/nvidia/WaveGlow-pytorch">pytorch</a></td>
Expand Down Expand Up @@ -688,3 +698,6 @@ sudo python inference/run.py

如有疑问,可以发送邮件至[email protected],或在[issue](https://github.com/FlagOpen/FlagPerf/issues)中说明情况




Binary file modified assets/imgs/coop.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion training/benchmarks/llama2_70B/megatron/megatron_main.sh
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ echo $FLASH_ATTN
echo $RECOMPUTE
echo $VENDOR_SHELL

DATA_PATH=$DATA_DIR/pile_wikipedia_demo
DATA_PATH=$DATA_DIR/llama_00_text_document
TOKENIZER_PATH=$DATA_DIR/tokenizer/tokenizer.model

DISTRIBUTED_ARGS="
Expand Down
32 changes: 16 additions & 16 deletions training/metax/llama2_7b-megatron-deepspeed/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,24 +31,24 @@

* 通用指标

| 指标名称 | 指标值 | 特殊说明 |
| ------------ | -------------------------- | ---------------------------------- |
| 任务类别 | 自然语言理解 | |
| 模型 | llama2_7b | |
| 数据集 | RedPajama-Data-1T-Sample | |
| 数据精度 | amp | |
| 超参修改 | parallel,见“性能指标” | 格式为TPxPPyDPz,例如TP2PP1DP4 |
| 超参修改 | fix_hp,见“性能指标” | 运行必要特殊超参,例如需要改小seqlength避免OOM |
| 硬件设备简称 | MXC500 | |
| 硬件存储使用 | mem,见“性能指标” | 通常称为“显存”,单位为GiB |
| 计算使用率 | MFU,见“性能指标” | 参见PaLM论文定义 |
| **吞吐量** | **token/p/s,见“性能指标”** | 平均单卡每秒处理的token数 |
| 指标名称 | 指标值 | 特殊说明 |
| ------- | ------------------------ | ----------------------------- |
| 任务类别 | 自然语言理解 | |
| 模型 | llama2_7b | |
| 数据集 | RedPajama-Data-1T-Sample | |
| 数据精度 | amp | |
| 超参修改 | parallel,见“性能指标” | 格式为TPxPPyDPz,例如TP2PP1DP4 |
| 超参修改 | fix_hp,见“性能指标” | 运行必要特殊超参,例如需要改小seqlength避免OOM |
| 硬件设备简称 | MXC500 | |
| 硬件存储使用 | mem,见“性能指标” | 通常称为“显存”,单位为GiB |
| 计算使用率 | MFU,见“性能指标” | 参见PaLM论文定义 |
| **吞吐量** | **token/p/s,见“性能指标”** | 平均单卡每秒处理的token数 |

* 性能指标

值得注意的是,下列第2组实验的global_batchsize与llama2原始论文相同, 训练100 step,此项实验也将作为精度对齐所用实验。精度对齐需第21步及之后,所有步的loss与nvidia对应步的loss平均相对误差小于2%。

| 配置 | parallel | fix_hp | token/p/s | loss | 是否精度对齐 | mem | MFU |
| ------------------- | -------- | ---------------- | ---------- | ---- | ---------- | ---------- | -------- |
| C500单机8卡(1x8) | TP1PP1DP8 | / | / | 3.83 | / | 62/64 | 51.8% |
| C500单机8卡(1x8) | TP4PP1DP2 | GAS=128(GBS=1024=4M tokens) | / | 7.77 | True | 62/64 | 53.0% |
| 配置 | parallel | fix_hp | token/p/s | 是否精度对齐 | mem | MFU |
| ------------- | --------- | --------------------------- | --------- | ------ | ----- | ----- |
| C500单机8卡(1x8) | TP1PP1DP8 | / | / | / | 62/64 | 51.8% |
| C500单机8卡(1x8) | TP4PP1DP2 | GAS=128(GBS=1024=4M tokens) | / | True | 62/64 | 53.0% |
3 changes: 2 additions & 1 deletion training/nvidia/docker_image/megatron/megatron_install.sh
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
#!/bin/bash
# using github mirrors to avoid github TTL
git clone https://githubfast.com/FlagOpen/FlagScale
cd FlagScale
git checkout 26cd6643c472f853e077779abaa51bb6a1c140bf
echo 'export PYTHONPATH=$PYTHONPATH:/workspace/FlagScale' >> /root/.bashrc
source /root/.bashrc
source /root/.bashrc
Loading