-
Notifications
You must be signed in to change notification settings - Fork 109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
【mthreads】【block】resnet50 training #246
Conversation
training/mthreads/README.md
Outdated
|
||
摩尔线程MTT S系列全功能GPU支持多样算力,借助覆盖深度学习、图形渲染、视频处理和科学计算的完整MUSA软件栈,可为AI训练、AI推理、大模型、AIGC、云游戏、云渲染、视频云、数字孪生等场景提供通用智能算力支持,旨在为数据中心、智算中心和元计算中心的建设构建坚实算力基础,助力元宇宙中多元应用创新和落地。 | ||
|
||
MUSA软件栈通过musify CUDA代码迁移工具、计算/通信加速库、mcc编译器、musa运行时和驱动实现对CUDA生态的兼容,帮助用户快速完成代码及应用的迁移。通过torch_musa插件,可以实现MTT S系列GPU对原生PyTroch的对接,用户可以无感的把AI模型运行在摩尔线程全功能GPU上。 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo Pytroch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
@@ -0,0 +1,5 @@ | |||
lr = 0.1 | |||
train_batch_size = 32 | |||
eval_batch_size = 32 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
请教下这里为什么缩小了8倍?是否八卡上面采用32的local_batchsize性能更优?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里是从nvidia配置文件copy过来的,目前还没有跑完整的训练
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
请尽可能优先支持2*8.即1*1/1*8/2*8
b29f5a1
to
61a64b2
Compare
4f23f28
to
9b4f983
Compare
9b4f983
to
952b243
Compare
training/mthreads/README.md
Outdated
## 环境配置参考 | ||
- 硬件 | ||
- 机器型号: MCCX D800 | ||
- 加速卡型号: MTT S3000 32GB |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
提交的配置,加速卡 是S4000
# We will run benchmarks in training/<vendor> | ||
VENDOR = "nvidia" | ||
VENDOR = "mthreads" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
无需修改
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里是为了方便测试,后续会revert
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
请在此文件中添加get_sys_info方法,获取 机器基本信息,并记录到sys_info.log中
具体可以参考:https://github.com/FlagOpen/FlagPerf/blob/main/training/iluvatar/iluvatar_monitor.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已添加
* [kunlunxin] fix tacotron2 running error and add 1x1 & 2x8 config (#346) * [kunlunxin] fix tacotron2 running error and add 1x1 & 2x8 config * [kunlunxin] modify tacotron2 test_config * [kunlunxin] update tacotron2 readme * [kunlunxin] modify tacotron2 torch.load() * [iluvatar] swin_transformer-pytorch 1x1 2x8 (#340) * update iluvatar/swin_transformer-pytorch * update * update * update * fix batch size mistake in readme * correct val_loss to final acc1 * add finnal_acc1 and mem in readme * correct readme mem --------- Co-authored-by: 魏杰 <[email protected]> Co-authored-by: 杨智超 <[email protected]> Co-authored-by: clveryang <[email protected]> * fix get_system_info for iluvatar_monitor (#351) Co-authored-by: zhouyu <[email protected]> * update iluvatar mobilenetv2 config (#356) Co-authored-by: sen.li <[email protected]> * Update README.md (#357) * Update README.md * Update README.md * [iluvatar] bertlarge inference case (#353) * iluvatar bertlarge MLM inference case * update ixrt readme --------- Co-authored-by: 杨智超 <[email protected]> * [mthreads] bert_hf 1x8 (#350) * support bert_hf fp32/amp/bf16 training for mthreads * update readme * prevent overrun * 1x1/2x8 not support * 【mthreads】【block】resnet50 training (#246) * support resnet50 training on mthreads * fix typo * support rn50 amp training on mthreads * add test config (should revert this commit) * update config & readme * add get_system_info fn * update * 1x1/2x8 not support --------- Co-authored-by: Zhou Yu <[email protected]> * fix llama, add TFLOPS log (#358) * fixllama * add t/tflops * [mthreads] deepspeed llama2 * update readme for sdpa --------- Co-authored-by: jamesruio <[email protected]> Co-authored-by: swish swish <[email protected]> Co-authored-by: 魏杰 <[email protected]> Co-authored-by: 杨智超 <[email protected]> Co-authored-by: clveryang <[email protected]> Co-authored-by: Zhou Yu <[email protected]> Co-authored-by: zhouyu <[email protected]> Co-authored-by: forestlee95 <[email protected]> Co-authored-by: sen.li <[email protected]> Co-authored-by: uuup <[email protected]> Co-authored-by: clveryang <[email protected]> Co-authored-by: mingyuanw-mt <[email protected]> Co-authored-by: shh2000 <[email protected]>
No description provided.