Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[kunlunxin] fix Vit and add configs #362

Merged
merged 52 commits into from
Dec 18, 2023
Merged
Show file tree
Hide file tree
Changes from 49 commits
Commits
Show all changes
52 commits
Select commit Hold shift + click to select a range
f738bac
init
May 19, 2023
b494ba5
add efficientnet
May 24, 2023
acfde41
modify config
May 24, 2023
b4e9627
modify config
May 24, 2023
c6fbea3
modify config
May 24, 2023
fce71f2
add efficientnet
May 24, 2023
ef390bc
modify config
May 24, 2023
51847e1
add efficientnet
May 25, 2023
3f904db
bug fix
May 25, 2023
48e835d
add efficientnet
May 25, 2023
8eaa8a5
Merge branch 'FlagOpen:main' into efficientnet
ScoThunder May 25, 2023
37d78be
add efficientnet
May 25, 2023
98361a5
fix code style
May 26, 2023
e6005bf
fix code style
May 26, 2023
ae86109
fix code style
May 29, 2023
fe6a418
Revert "fix code style"
May 29, 2023
6684a5d
fix code style
May 29, 2023
746377a
fix code style
May 29, 2023
b3d9786
fix code style
May 29, 2023
a70db8d
fix code style
May 29, 2023
b672228
fix code style
May 29, 2023
df3a2b2
Merge branch 'FlagOpen:main' into efficientnet
ScoThunder May 30, 2023
565a35d
Merge branch 'FlagOpen:main' into main
ScoThunder May 31, 2023
b21dde9
Merge branch 'FlagOpen:main' into main
ScoThunder Jun 2, 2023
d9c089b
Merge branch 'FlagOpen:main' into main
ScoThunder Jun 5, 2023
d3b3e57
bug fix
Jun 6, 2023
ffda7ad
Merge branch 'FlagOpen:main' into main
ScoThunder Jun 6, 2023
cb97a53
Merge branch 'FlagOpen:main' into main
ScoThunder Jun 7, 2023
10faf97
Merge branch 'FlagOpen:main' into main
ScoThunder Jun 8, 2023
d679833
Merge branch 'FlagOpen:main' into main
ScoThunder Jun 20, 2023
2fad460
add kunlunxin readme
Jun 21, 2023
9db4cab
Merge branch 'FlagOpen:main' into main
ScoThunder Jun 28, 2023
ba5aa39
Merge branch 'FlagOpen:main' into main
ScoThunder Jul 4, 2023
13a9504
Merge branch 'FlagOpen:main' into main
ScoThunder Jul 19, 2023
a39e1de
Merge branch 'FlagOpen:main' into main
ScoThunder Jul 21, 2023
33c1423
Merge branch 'FlagOpen:main' into main
ScoThunder Sep 26, 2023
10a5003
Merge branch 'FlagOpen:main' into main
ScoThunder Nov 1, 2023
228bfa4
Merge branch 'FlagOpen:main' into main
ScoThunder Nov 6, 2023
dd9e91f
fix mobilenetv2 on kunlunxin
Nov 6, 2023
fce96cb
add mobilenet config_R300x2x8.py
Nov 7, 2023
78721af
Merge branch 'FlagOpen:main' into main
ScoThunder Nov 15, 2023
8ccc252
fix mobilenetv2 on kunlunxin
Nov 15, 2023
ca0cc5c
fix vit on kunlunxin
Nov 15, 2023
275dfd9
fix vit on kunlunxin
Nov 29, 2023
88c7b22
Merge branch 'FlagOpen:main' into vit
ScoThunder Dec 11, 2023
2f6d67a
add vit 1x8 on kunlunxin
Dec 12, 2023
1bc972d
Merge branch 'FlagOpen:main' into vit
ScoThunder Dec 12, 2023
245139b
add vit 2x8 1x1 on kunlunxin
Dec 12, 2023
fd9bf6d
add vit 2x8 1x1 on kunlunxin
Dec 12, 2023
3748f38
add vit 2x8 1x1 on kunlunxin
Dec 13, 2023
04f43ee
fix code style
Dec 18, 2023
602e6b2
fix code style
Dec 18, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
32 changes: 25 additions & 7 deletions training/kunlunxin/vit-pytorch/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,15 +18,33 @@


### 运行情况
| 训练资源 | 配置文件 | 运行时长(s) | 目标精度 | 收敛精度 | Steps数 | 性能(samples/s) |
| -------- | --------------- | ----------- | -------- | -------- | ------- | ---------------- |
| 单机1卡 | config_R300x1x1 | / | | / | | |
| 单机8卡 | config_R300x1x8 | | 79.982 | 66.166 | 181380 | |
| 两机8卡 | config_R300x2x8 | / | | / | | |
* 通用指标

### 收敛曲线
![acc](acc.png)
| 指标名称 | 指标值 | 特殊说明 |
| -------------- | --------------------------------------------- | ------------------------------------------- |
| 任务类别 | Image Classification && Semantic Segmantation | |
| 模型 | Vision Transformer | |
| 数据集 | Imagenet2012 1K | |
| 数据精度 | precision,见“性能指标” | 可选fp32/amp/fp16 |
| 超参修改 | fix_hp,见“性能指标” | 跑满硬件设备评测吞吐量所需特殊超参 |
| 硬件设备简称 | R300 | |
| 硬件存储使用 | mem,见“性能指标” | 通常称为“显存”,单位为GiB |
| 端到端时间 | e2e_time,见“性能指标” | 总时间+Perf初始化等时间 |
| 总吞吐量 | p_whole,见“性能指标” | 实际训练图片数除以总时间(performance_whole) |
| 训练吞吐量 | p_train,见“性能指标” | 不包含每个epoch末尾的评估部分耗时 |
| **计算吞吐量** | **p_core,见“性能指标”** | 不包含数据IO部分的耗时(p3>p2>p1) |
| 训练结果 | acc,见“性能指标” | 单位为top1分类准确率(acc1) |
| 额外修改项 | 无 | |



* 性能指标

| 配置 | precision | fix_hp | e2e_time | p_whole | p_train | p_core | acc | mem |
| ------------------- | --------- | -------------- | -------- | ------- | ------- | ------ | ------ | --------- |
| R300单机单卡(1x1) | fp32 | / | / | | | | / | /32.0 |
| R300单机8卡(1x8) | fp32 | bs=128,lr=0.0015 | | | | | 79.30% | 24.2/32.0 |
ScoThunder marked this conversation as resolved.
Show resolved Hide resolved
| R300两机8卡(2x8) | fp32 | bs=128,lr=0.003 | / | | | | / | /32.0 |
### 许可证

Apache 2.0 license。
5 changes: 5 additions & 0 deletions training/kunlunxin/vit-pytorch/config/config_R300x1x1.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
from config_common import *

train_batch_size = 128
eval_batch_size = 512
gradient_accumulation_steps = 4
1 change: 1 addition & 0 deletions training/kunlunxin/vit-pytorch/config/config_R300x1x8.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,5 @@

train_batch_size = 128
eval_batch_size = 512
lr = 0.003 * 0.5
gradient_accumulation_steps = 4
6 changes: 6 additions & 0 deletions training/kunlunxin/vit-pytorch/config/config_R300x2x8.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
from config_common import *

train_batch_size = 128
eval_batch_size = 512
gradient_accumulation_steps = 2
epochs = 8
ScoThunder marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
export XACC_ENABLE=1
ScoThunder marked this conversation as resolved.
Show resolved Hide resolved
export BKCL_PCIE_RING=1
export BKCL_TIMEOUT=1800
export XMLIR_D_XPU_L3_SIZE=66060288
2 changes: 2 additions & 0 deletions training/kunlunxin/vit-pytorch/config/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
https://download.pytorch.org/whl/cpu/torchvision-0.13.1%2Bcpu-cp38-cp38-linux_x86_64.whl
tabulate
1 change: 1 addition & 0 deletions training/run_benchmarks/config/test_conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -115,6 +115,7 @@
# "transformer_xl:pytorch:R300:1:8:1": "/raid/dataset/transformer_xl/",
# "glm:pytorch:R300:1:8:1": "/raid/home_datasets_ckpt/glm/train/",
# "mobilenetv2:pytorch:R300:1:8:1": "/raid/dataset/ImageNet_1k_2012/",
# "vit:pytorch:R300:1:8:1": "/raid/dataset/ImageNet_1k_2012/",
# "bert:pytorch:R300:1:8:1": "/raid/dataset/bert_large/train",
# "longformer:pytorch:R300:1:8:1": "/raid/dataset/longformer_train",
# "distilbert:pytorch:R300:1:8:1": "/raid/dataset/distilbert/",
Expand Down