运行一会后报错 #28

Leng-bingo · 2023-05-01T08:53:26Z

  File "/media/admin1/envs/anaconda3/envs/leng_lip/lib/python3.7/site-packages/transformers/models/clipseg/modeling_clipseg.py", line 1458, in forward
    conditional_pixel_values=conditional_pixel_values,
  File "/media/admin1/envs/anaconda3/envs/leng_lip/lib/python3.7/site-packages/transformers/models/clipseg/modeling_clipseg.py", line 1360, in get_conditional_embeddings
    raise ValueError("Make sure to pass as many prompt texts as there are query images")
ValueError: Make sure to pass as many prompt texts as there are query images

能成功100多张图片，然后就会出现这样的报错停止。
使用命令如下

python scripts/main_ssa_engine.py --data_dir=data/UCM_Captions --out_dir=output --world_size=4 --save_img --sam --ckpt_path=../../mydata/sam_vit_h_4b8939.pth --light_mode

The text was updated successfully, but these errors were encountered:

chenjixuan20 · 2023-05-25T02:30:19Z

遇到了同样的问题，请问楼主解决了么

Yaojun-Lai · 2023-06-01T08:51:24Z

我也遇到了同样的问题，请问有解决方法吗？

Jiaqi-Chen-00 · 2023-06-05T13:12:29Z

I suggest adding a breakpoint before this line of code: https://github.com/fudan-zvg/Semantic-Segment-Anything/blob/main/scripts/pipeline.py#L95 . Check the shapes of patch_huge and mask_categories because the error message indicates a mismatch between the two. It is highly likely that the number of texts is 0 or too few. In such cases, additional preprocessing should be added before line 95 to handle this specific situation.

xxxwuwq · 2023-06-19T09:23:11Z

I have encountered the same issue. The problem occurs in the clipseg_segmentation function in the Semantic-Segment-Anything/scripts/clipseg.py file. When class_list contains only one class, it appears as a string instead of a list. Additionally, when there is only a single class, the output of clipseg_model's logits will have a shape of (H, W), which causes dimension inconsistency when using F.interpolate for scaling. Therefore, some modifications are needed. Here is the modified code:

def clipseg_segmentation(image, class_list, clipseg_processor, clipseg_model, rank):
    if isinstance(class_list, str):
        class_list = [class_list, ]
    inputs = clipseg_processor(
        text=class_list, images=[image] * len(class_list),
        padding=True, return_tensors="pt").to(rank)
    # resize inputs['pixel_values'] to the longest side of inputs['pixel_values']
    h, w = inputs['pixel_values'].shape[-2:]
    fixed_scale = (512, 512)
    inputs['pixel_values'] = F.interpolate(
        inputs['pixel_values'],
        size=fixed_scale,
        mode='bilinear',
        align_corners=False)
    outputs = clipseg_model(**inputs)
    try:
        logits = F.interpolate(outputs.logits[None], size=(h, w), mode='bilinear', align_corners=False)[0]
    except Exception as e:
        logits = F.interpolate(outputs.logits[None, None, ...], size=(h, w), mode='bilinear', align_corners=False)[0]
    return logits

This modification includes converting class_list from a string to a list when it is of type str. It also ensures consistent dimensions when using F.interpolate by handling the case when clipseg_model's logits have a shape of (H, W).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

运行一会后报错 #28

运行一会后报错 #28

Leng-bingo commented May 1, 2023

chenjixuan20 commented May 25, 2023

Yaojun-Lai commented Jun 1, 2023

Jiaqi-Chen-00 commented Jun 5, 2023

xxxwuwq commented Jun 19, 2023

运行一会后报错 #28

运行一会后报错 #28

Comments

Leng-bingo commented May 1, 2023

chenjixuan20 commented May 25, 2023

Yaojun-Lai commented Jun 1, 2023

Jiaqi-Chen-00 commented Jun 5, 2023

xxxwuwq commented Jun 19, 2023