Fix InternVL2 model sharding #481

pufanyi · 2024-12-27T15:59:02Z

As described in here and here

The reason for writing the code this way is to avoid errors that occur during multi-GPU inference due to tensors not being on the same device. By ensuring that the first and last layers of the large language model (LLM) are on the same device, we prevent such errors.

Fix InternVL2 model sharding

4535e52

pufanyi requested a review from Luodian December 27, 2024 15:59

Luodian approved these changes Dec 28, 2024

View reviewed changes

Luodian merged commit b9b3c1a into main Dec 28, 2024
2 checks passed

pufanyi deleted the pufanyi/internvl2 branch December 28, 2024 07:13

kcz358 pushed a commit that referenced this pull request Dec 30, 2024

Fix InternVL2 model sharding (#481)

75f4536

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix InternVL2 model sharding #481

Fix InternVL2 model sharding #481

pufanyi commented Dec 27, 2024

Fix InternVL2 model sharding #481

Fix InternVL2 model sharding #481

Conversation

pufanyi commented Dec 27, 2024