是否有更高效的大模型训练方法 #2335
-
我使用colossalai训练了一个70层的GPT模型,使用了tp=4,pp=6,fp16,加每一层checkpoint,在24张A100上进行训练,测试下来,按照PaLM(https://arxiv.org/pdf/2204.02311.pdf )4.1节中的效率计算方式,colossalai的硬件利用率只能达到26%左右。然而过去的几个大模型的效率大概如下 |
Beta Was this translation helpful? Give feedback.
Replies: 5 comments 33 replies
-
改成4路tp+6路fsdp试试呢 |
Beta Was this translation helpful? Give feedback.
-
过去的几个大模型的效率大概如下 |
Beta Was this translation helpful? Give feedback.
-
https://github.com/hpcaitech/GPT-Demo |
Beta Was this translation helpful? Give feedback.
-
你report这些数据【然而过去的几个大模型的效率大概如下】都是自己测的么? |
Beta Was this translation helpful? Give feedback.
-
是的,目前是有这两个版本。由于过去的大模型采用pp+tp的居多,所以目前对pp+tp这个方案测试实验跑得比较多。 |
Beta Was this translation helpful? Give feedback.
你report这些数据【然而过去的几个大模型的效率大概如下】都是自己测的么?