Skip to content

Commit

Permalink
Merge branch 'main' into moflow
Browse files Browse the repository at this point in the history
  • Loading branch information
shh2000 authored Mar 6, 2024
2 parents e019f7a + a67831d commit fc84f0d
Show file tree
Hide file tree
Showing 20 changed files with 1,024 additions and 4 deletions.
15 changes: 15 additions & 0 deletions inference/benchmarks/bertLarge/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -77,6 +77,19 @@ bert_reference_results_text_md5.txt
- IXRT: ixrt-0.8.0+corex.3.2.1


#### 2.4 沐曦集成电路 C500

- ##### 硬件环境
- 机器、加速卡型号: 曦云®C500 64G
- ##### 软件环境
- OS版本:Ubuntu 20.04.6
- OS kernel版本: 5.4.0-26-generic
- 加速卡驱动版本:2.2.0
- Docker 版本:24.0.7
- 推理框架版本:pytorch-2.0.0+mc2.18.0.8-cp38-cp38-linux_x86_64.whl



### 4. 运行情况(BERT-Large)

* 指标列表
Expand All @@ -103,4 +116,6 @@ bert_reference_results_text_md5.txt
| tensorrt | fp32 | 32 | 1868.8 | 150.4 | 152.2 | 190.4 | 194.1 | 42.0% | 0.638/0.638 | 16.9/40.0 |
| kunlunxin_xtcl| W32A16 | 32 |/ | / | / | / | / | / | 0.638/0.638| /|
| iluvatar_ixrt| fp16 | 32 |/ | / | / | / | / | / | 0.599/0.638| /|
| metax-nocompiler| fp16 | 32 |/ | / | / | / | / | 27.6% | 0.638/0.638| 4.3/64.0|
| metax-nocompiler| fp32 | 32 |/ | / | / | / | / | 28.1% | 0.639/0.638| 6.1/64.0|

3 changes: 2 additions & 1 deletion inference/benchmarks/resnet50/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -142,4 +142,5 @@ find ./val -name "*JPEG" | wc -l
| kunlunxin_xtcl | fp32 | 128 | / | / | / | / | / | / | 76.2/76.2 | 4.52/32.0 |
| kunlunxin_xtcl | fp16 | 256 | / | / | / | / | / | / | 76.2/76.2 | 4.52/32.0 |
| zixiao | fp16 | 32*6 | 261.103 | / | / | 193.151 | 6342.191 | / | 76.2/76.2 | / |

| metax-nocompiler | fp16 | 256 |/ | / | / | / | / | 7.8% | 76.2/76.2 | 3.83/64.0 |
| metax-nocompiler | fp32 | 256 | / | / | / | / | / | 7.7% | 76.2/76.2 | 5.46/64.0 |
3 changes: 2 additions & 1 deletion inference/benchmarks/swinTransformer/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -84,4 +84,5 @@ find ./val -name "*JPEG" | wc -l
| ----------- | --------- | ---- | ---- | -------- | ----------- | ---------- | ------------- | ------------ | ----------- | ----------- |
| tensorrt | fp16 | 512 |1011.7 | 1347.5 | 1511.3 | 1231.7 | 1359.1 | 6.8% | 81.7/83.2 | 19.9/40.0 |
| tensorrt | fp32 | 256 | 856.9 | 761.5 | 794.3 | 789.2 | 826.4 | 8.2% | 83.2/83.2 | 20.0/40.0 |
| kunlunxin_xtcl| W32A16 | 256 | 543.745 | / | / | / | / | / | 0.832 | / |
| kunlunxin_xtcl| W32A16 | 256 | / | / | / | / | / | / | 0.832 | / |
| metax-nocompiler| fp16 | 512 | / | / | / | / | / | 6.5% | 0.832 |10.6/64.0 |
6 changes: 6 additions & 0 deletions training/benchmarks/driver/helper.py
Original file line number Diff line number Diff line change
Expand Up @@ -88,6 +88,12 @@ def set_seed(self, seed: int, vendor: str = None):
torch.backends.cudnn.benchmark = getattr(config, "cudnn_benchmark")
torch.backends.cudnn.deterministic = getattr(
config, "cudnn_deterministic")
elif lower_vendor == "dcu":
import torch
torch.manual_seed(seed)
torch.cuda.manual_seed(seed)
torch.cuda.manual_seed_all(seed)
torch.backends.cudnn.benchmark = True
else:
# TODO 其他厂商设置seed,在此扩展
pass
Expand Down
66 changes: 66 additions & 0 deletions training/dcu/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
# 厂商信息

海光 DCU 系列产品以 GPGPU 架构为基础,兼容通用的“类 CUDA” 环境以及国际主流商业计算软件和人工智能软件,软硬件生态丰富,可广泛应用于大数据处理、人工智能、商业计算等应用领域。

海光 DCU 兼容“类 CUDA” 环境, 软硬件生态丰富,典型应用场景下性能指标达到国际上同类型高端产品的水平。

海光 DCU 主要面向大数据处理、商业计算等计算密集型应用领域,以及人工智能、 泛人工智能类运算加速领域。

# FlagPerf适配验证环境说明
## 环境配置参考
- 硬件
- 机器型号:K100 标准机
- 加速卡型号: K100 64G
- 软件
- OS版本:centos 7.6
- OS kernel版本: 4.18.0-348.el8.0.2.x86_64
- Docker 版本: 24.0.7

## 容器镜像信息
- 容器构建信息
- Dockerfile路径:training/dcu/docker_image/\<framework\>/Dockerfile
- 构建后软件安装脚本: training/dcu/docker_image/\<framework\>/\<framework\>_install.sh

- 核心软件信息

- AI框架&版本
- torch: 1.13.1

- 其它软件版本
- dtk: 23.10.1


## 加速卡监控采集
- 加速卡使用信息采集命令

dcu_monitor.py中79行需要修改为实际source的地址

```
source /path/of/dtk/env.sh
rocm-smi
```

- 监控项示例:

```
============================ System Management Interface =============================
======================================================================================
DCU Temp AvgPwr Perf PwrCap VRAM% DCU% Mode
0 53.0C 96.0W auto 300.0W 0% 0% Normal
1 53.0C 96.0W auto 300.0W 0% 0% Normal
2 54.0C 95.0W auto 300.0W 0% 0% Normal
3 55.0C 96.0W auto 300.0W 0% 0% Normal
4 54.0C 97.0W auto 300.0W 0% 0% Normal
5 54.0C 95.0W auto 300.0W 0% 0% Normal
6 55.0C 93.0W auto 300.0W 0% 0% Normal
7 54.0C 96.0W auto 300.0W 0% 0% Normal
======================================================================================
=================================== End of SMI Log ===================================
```
- 加速卡使用信息采集项说明
|监控项| 日志文件 |
|---|---|
|VRAM(%) | dcu_monitor.log |
|DCU(%) | dcu_monitor.log |
Loading

0 comments on commit fc84f0d

Please sign in to comment.