-
Notifications
You must be signed in to change notification settings - Fork 109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add kunlun llama70b case #462
Conversation
@@ -132,7 +132,9 @@ LOGGING_ARGS=" | |||
" | |||
|
|||
source $VENDOR_SHELL | |||
cmd="torchrun $DISTRIBUTED_ARGS /workspace/FlagScale/pretrain_llama.py \ | |||
CODE_PATH="/workspace/FlagScale/pretrain_llama.py" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里请把source vendor_shell放到CODEpath后面吧
| R300十机32卡(10x8) | amp | TP8PP4DP1 | / | / | / | / | / | | ||
| R300十机32卡(10x8) | amp | TP4PP8DP1 | / | / | / | 21/32 | / | | ||
| R300十机32卡(10x8) | amp | TP4PP8DP1 | GAS=1024(GBS=1024=4M tokens) | / | doing | 21/32 | / | | ||
因缺少R300机器,在单卡R300与单卡GPU上初步验证精度 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
*doing:因缺少blabla,目前已通过减小模型层数的方式,在单卡R300与单卡GPU上验证精度。完整70B模型的精度验证进行中。可以按照这个写好后,上面表格里的doing改成doing*
batchsize = 1 | ||
accumulate_steps = 44 | ||
train_tokens = 100000000 | ||
theoryflops = 495000000000000.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
昆仑芯理论算力是495TFLOPS嘛
No description provided.