Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

【mthreads】【block】resnet50 training #246

Merged
merged 9 commits into from
Dec 12, 2023
Merged
Show file tree
Hide file tree
Changes from 7 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 19 additions & 0 deletions training/benchmarks/driver/dist_pytorch.py
Original file line number Diff line number Diff line change
Expand Up @@ -149,6 +149,8 @@ def barrier(vendor="nvidia"):
if torch.distributed.is_available() and torch.distributed.is_initialized():
if vendor == "kunlunxin":
torch.distributed.barrier()
elif vendor == "mthreads":
torch.distributed.barrier()
else:
torch.distributed.all_reduce(torch.cuda.FloatTensor(1))
torch.cuda.synchronize()
Expand All @@ -172,6 +174,23 @@ def init_dist_training_env(config):
rank=rank,
world_size=world_size)
config.n_device = torch.distributed.get_world_size()
elif config.vendor == "mthreads":
import torch_musa
if int(os.environ.get("WORLD_SIZE", 1)) <= 1:
config.device = torch.device("musa")
config.n_device = 1
else:
torch.musa.set_device(config.local_rank)
host_addr_full = 'tcp://' + os.environ[
"MASTER_ADDR"] + ':' + os.environ["MASTER_PORT"]
rank = int(os.environ["RANK"])
world_size = int(os.environ["WORLD_SIZE"])
torch.distributed.init_process_group(backend=config.dist_backend,
init_method=host_addr_full,
rank=rank,
world_size=world_size)
config.device = torch.device("musa", config.local_rank)
config.n_device = torch.distributed.get_world_size()
else: # nvidia
if int(os.environ.get("WORLD_SIZE", 1)) <= 1:
config.device = torch.device("cuda")
Expand Down
6 changes: 6 additions & 0 deletions training/benchmarks/driver/helper.py
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,12 @@ def set_seed(self, seed: int, vendor: str = None):
elif lower_vendor == "ascend":
import mindspore
mindspore.set_seed(seed)
elif lower_vendor == "mthreads":
import torch
import torch_musa
torch.manual_seed(seed)
torch.musa.manual_seed(seed)
torch.musa.manual_seed_all(seed)
else:
# TODO 其他厂商设置seed,在此扩展
pass
17 changes: 1 addition & 16 deletions training/benchmarks/resnet50/pytorch/train/trainer.py
Original file line number Diff line number Diff line change
Expand Up @@ -82,22 +82,7 @@ def train_one_epoch(self, train_dataloader, eval_dataloader):
pure_start_time = time.time()
optimizer.zero_grad()

images, target = batch
if scaler is not None:
with torch.cuda.amp.autocast(enabled=True):
output = model(images)
loss = criterion(output, target)

scaler.scale(loss).backward()
scaler.step(optimizer)
scaler.update()
else:
output = model(images)

criterion = torch.nn.CrossEntropyLoss()
loss = criterion(output, target)
loss.backward()
optimizer.step()
loss = self.adapter.train_step(model, batch, optimizer, scaler)

if step % self.config.log_freq == 0:
print("Train Step " + str(step) + "/" + str(len(data_loader)) +
Expand Down
20 changes: 20 additions & 0 deletions training/benchmarks/resnet50/pytorch/train/trainer_adapter.py
Original file line number Diff line number Diff line change
Expand Up @@ -41,3 +41,23 @@ def create_grad_scaler():
"""create_grad_scaler for mixed precision training"""
scaler = torch.cuda.amp.GradScaler() if config.amp else None
return scaler


def train_step(model, batch, optimizer, scaler=None):
"""train one step"""
images, target = batch
criterion = torch.nn.CrossEntropyLoss()
if scaler:
with torch.cuda.amp.autocast(enabled=True):
output = model(images)
loss = criterion(output, target)
scaler.scale(loss).backward()
scaler.step(optimizer)
scaler.update()
else:
output = model(images)
loss = criterion(output, target)
loss.backward()
optimizer.step()

return loss
70 changes: 70 additions & 0 deletions training/mthreads/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@

# 厂商信息

官网: https://www.mthreads.com/

摩尔线程智能科技(北京)有限责任公司(简称:摩尔线程)是一家以GPU芯片设计为主的集成电路设计企业,专注于研发设计全功能GPU芯片及相关产品,为科技生态合作伙伴提供强大的计算加速能力。公司致力于创新研发面向“元计算”应用的新一代GPU,构建融合视觉计算、3D图形计算、科学计算及人工智能计算的综合计算平台,建立基于云原生GPU计算的生态系统,助力驱动数字经济发展。

摩尔线程MTT S系列全功能GPU支持多样算力,借助覆盖深度学习、图形渲染、视频处理和科学计算的完整MUSA软件栈,可为AI训练、AI推理、大模型、AIGC、云游戏、云渲染、视频云、数字孪生等场景提供通用智能算力支持,旨在为数据中心、智算中心和元计算中心的建设构建坚实算力基础,助力元宇宙中多元应用创新和落地。

MUSA软件栈通过musify CUDA代码迁移工具、计算/通信加速库、mcc编译器、musa运行时和驱动实现对CUDA生态的兼容,帮助用户快速完成代码及应用的迁移。通过torch_musa插件,可以实现MTT S系列GPU对原生PyTorch的对接,用户可以无感的把AI模型运行在摩尔线程全功能GPU上。

# FlagPerf适配验证环境说明
## 环境配置参考
- 硬件
- 机器型号: MCCX D800
- 加速卡型号: MTT S4000 48GB
- CPU型号:Intel(R) Xeon(R) Gold 6430 CPU @ 2.00GHz
- 多机网络类型、带宽: InfiniBand,2*200Gbps
- 软件
- OS版本:Ubuntu 20.04 LTS
- OS kernel版本: 5.4.0-42-generic
- 加速卡驱动版本:2.2.0
- Docker 版本: 20.10.24

## 容器镜像信息
- 容器构建信息
- Dockerfile路径:training/mthreads/docker_image/pytorch_2.0/Dockerfile
- 构建后软件安装脚本: training/mthreads/docker_image/pytorch_2.0/pytorch_2.0_install.sh

- 核心软件信息

- AI框架&版本
- PyTorch: v2.0.0

- 其它软件版本
- torch_musa: 2.0.0+git8614ba1
- musa toolkits: 1.5.0+git3d8791d
- mcc: 1.5.2+git3730bdd
- mublas: 1.2.0+gitd9867b5


## 加速卡监控采集
- 加速卡使用信息采集命令

```bash
mthreads-gmi -q | grep -E 'GPU Current Temp|Power Draw|Used|Total|Gpu' | \
awk -F ': *' '/GPU Current Temp|Power Draw|Used|Total|Gpu/ \
{ values[(NR-1)%5+1] = $2; } NR % 5 == 0 { print values[4], values[5], values[2], values[1], values[3]; }'
```
- 监控项示例:
```bash
45C 109.51W 1MiB 32768MiB 0%
44C 108.95W 1MiB 32768MiB 0%
46C 110.87W 1MiB 32768MiB 0%
43C 104.33W 1MiB 32768MiB 0%
44C 107.55W 8MiB 32768MiB 0%
46C 110.51W 8MiB 32768MiB 0%
44C 106.59W 8MiB 32768MiB 0%
44C 104.58W 8MiB 32768MiB 0%
```
- 加速卡使用信息采集项说明

|监控项| 日志文件 | 格式 |
|---|---|---|
|温度| mthreads_monitor.log | xxx C |
|功耗 |mthreads_monitor.log | xxx W |
|显存占用大小 |mthreads_monitor.log |xxx MiB |
|总显存大小 |mthreads_monitor.log |xxx MiB |
|显存使用率 |mthreads_monitor.log |xxx % |

3 changes: 3 additions & 0 deletions training/mthreads/docker_image/pytorch_2.0/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
FROM moore-threads/pytorch:flagperf-py38
ENV PATH /opt/conda/envs/py38/bin:$PATH
ENV LD_LIBRARY_PATH=/usr/local/musa/lib/:$LD_LIBRARY_PATH
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
#!/bin/bash
Loading