mmap issue in bf16 of gpt-fast #165

yanbing-j · 2024-04-28T08:49:39Z

gpt-fast will use torch.load with mmap=True to load checkpoints of models. This may help speed up model load time. However, eventually, mmap is not used in bf16, because in https://github.com/pytorch-labs/gpt-fast/blob/main/generate.py#L247, model will to bfloat16 from float16 when running bf16 model. to will malloc a new memory area, mapped file is not used.

Meanwhile, in int8/int4, the logic of https://github.com/pytorch-labs/gpt-fast/blob/main/generate.py#L247 does not make sense. int8 model should not convert to bfloat16 data type. Now, int8/int4 can work well, because weight is not a parameter of int8/int4 modules by chance.

The text was updated successfully, but these errors were encountered:

yanboliang · 2024-09-16T04:39:12Z

@yanbing-j The goal of gpt-fast is to demonstrate the list of optimization we did to accelerate inference, model loading is not the major bottleneck, so we didn't do too much optimization. But I do agree with your points, we are also welcome PRs for these optimizations.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mmap issue in bf16 of gpt-fast #165

mmap issue in bf16 of gpt-fast #165

yanbing-j commented Apr 28, 2024

yanboliang commented Sep 16, 2024

mmap issue in bf16 of gpt-fast #165

mmap issue in bf16 of gpt-fast #165

Comments

yanbing-j commented Apr 28, 2024

yanboliang commented Sep 16, 2024