Common Issue Summary 常见问题汇总 #232

czczup · 2024-05-30T16:05:14Z

Hi everyone,

This is a Common Issue Summary where I will compile the frequently encountered issues. If you notice any omissions, please feel free to help add to the list. Thank you!

这里是常见问题汇总，我会在这里汇总一些常见的问题。如果有遗漏的地方，请大家帮忙补充，谢谢！

czczup · 2024-05-30T16:06:52Z

I will summarize common issues here.

1. Multi-GPU Inference - RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1!

Issues: #229, #118

Many people have encountered this bug, and we haven't yet found a good method to handle all cases. However, there is a workaround that requires manually assigning devices to the model.

For example, deploying this 26B model on two V100 GPUs:

The model is a total of 26B, so the ideal situation is 13B per card. Therefore, after excluding the 6B for ViT, card 0 needs to hold 7B, which means 1/3 of the 20B LLM is on card 0, and 2/3 is on card 1.

In code, it would look like this:

device_map = {
    'vision_model': 0,
    'mlp1': 0,
    'language_model.model.tok_embeddings': 0,  # near the first layer of LLM
    'language_model.model.norm': 1,  # near the last layer of LLM
    'language_model.output.weight': 1  # near the last layer of LLM
}
for i in range(16):
    device_map[f'language_model.model.layers.{i}'] = 0
for i in range(16, 48):
    device_map[f'language_model.model.layers.{i}'] = 1
print(device_map)
model = AutoModel.from_pretrained(
    path,
    torch_dtype=torch.bfloat16,
    low_cpu_mem_usage=True,
    trust_remote_code=True,
    device_map=device_map
).eval()

2. Multi-Image Inference - When the number of images exceeds two, the model seems to treat all the input as one image. From the code, the model seems to input all the blocks to the model together, without distinguishing between different images. Even with lmdeploy, the problem is the same.

Issues: #223,

The current V1.5 model was not trained with such (interleaved) data. Modifying the inference interface can support it, but the results are unstable.

The June version will include multi-image interleaved training, which should improve performance. The code will also support this feature at that time.

3. Prompt Format

Issues: #227

TODO

4. Quantification - AWQ / INT4 Quantification, Low GPU utilization during int8 model inference

Issues: #209, #210, #193, #167

Thanks to the lmdeploy team for providing AWQ quantization support.

The 4-bit model is available at OpenGVLab/InternVL-Chat-V1-5-AWQ. You can try this one.

czczup pinned this issue May 30, 2024

czczup changed the title ~~Common Issue Summary~~ Common Issue Summary 常见问题汇总 May 30, 2024

czczup closed this as completed Sep 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Common Issue Summary 常见问题汇总 #232

Common Issue Summary 常见问题汇总 #232

czczup commented May 30, 2024 •

edited

Loading

czczup commented May 30, 2024 •

edited

Loading

Common Issue Summary 常见问题汇总 #232

Common Issue Summary 常见问题汇总 #232

Comments

czczup commented May 30, 2024 • edited Loading

czczup commented May 30, 2024 • edited Loading

I will summarize common issues here.

1. Multi-GPU Inference - RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1!

2. Multi-Image Inference - When the number of images exceeds two, the model seems to treat all the input as one image. From the code, the model seems to input all the blocks to the model together, without distinguishing between different images. Even with lmdeploy, the problem is the same.

3. Prompt Format

4. Quantification - AWQ / INT4 Quantification, Low GPU utilization during int8 model inference

czczup commented May 30, 2024 •

edited

Loading

czczup commented May 30, 2024 •

edited

Loading