Fix loading pretrained-mm-projector errors under Deepspeed Zero3. #1250

lockon-n · 2024-03-08T18:20:22Z

In the fine-tuning stage of llava, if we apply deepspeed zero3, it will put placeholders in model parameters instead of initializing real ones.

As a result, the naive load_state_dict raises errors when the code tries to load the pretrained mm projector from somewhere like mm_projector.bin.

This PR solves this by detecting if deepspeed zero3 is applied by the is_deepspeed_zero3_enabled() from transformers, and wraps the loading code with deepspeed.zero.GatheredParameters to make it effective in that case.

Fix.

Update llava_arch.py

1487a5a

Fix.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix loading pretrained-mm-projector errors under Deepspeed Zero3. #1250

Fix loading pretrained-mm-projector errors under Deepspeed Zero3. #1250

lockon-n commented Mar 8, 2024

Fix loading pretrained-mm-projector errors under Deepspeed Zero3. #1250

Are you sure you want to change the base?

Fix loading pretrained-mm-projector errors under Deepspeed Zero3. #1250

Conversation

lockon-n commented Mar 8, 2024