Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add kunlun llama70b case #462

Closed
wants to merge 2 commits into from
Closed

add kunlun llama70b case #462

wants to merge 2 commits into from

Conversation

ZLkanyo009
Copy link
Contributor

No description provided.

@@ -132,7 +132,9 @@ LOGGING_ARGS="
"

source $VENDOR_SHELL
cmd="torchrun $DISTRIBUTED_ARGS /workspace/FlagScale/pretrain_llama.py \
CODE_PATH="/workspace/FlagScale/pretrain_llama.py"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里请把source vendor_shell放到CODEpath后面吧

| R300十机32卡(10x8) | amp | TP8PP4DP1 | / | / | / | / | / |
| R300十机32卡(10x8) | amp | TP4PP8DP1 | / | / | / | 21/32 | / |
| R300十机32卡(10x8) | amp | TP4PP8DP1 | GAS=1024(GBS=1024=4M tokens) | / | doing | 21/32 | / |
因缺少R300机器,在单卡R300与单卡GPU上初步验证精度
Copy link
Collaborator

@shh2000 shh2000 Feb 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*doing:因缺少blabla,目前已通过减小模型层数的方式,在单卡R300与单卡GPU上验证精度。完整70B模型的精度验证进行中。可以按照这个写好后,上面表格里的doing改成doing*

batchsize = 1
accumulate_steps = 44
train_tokens = 100000000
theoryflops = 495000000000000.0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

昆仑芯理论算力是495TFLOPS嘛

@ZLkanyo009 ZLkanyo009 closed this by deleting the head repository Feb 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants