Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

运行一会后报错 #28

Open
Leng-bingo opened this issue May 1, 2023 · 4 comments
Open

运行一会后报错 #28

Leng-bingo opened this issue May 1, 2023 · 4 comments

Comments

@Leng-bingo
Copy link

  File "/media/admin1/envs/anaconda3/envs/leng_lip/lib/python3.7/site-packages/transformers/models/clipseg/modeling_clipseg.py", line 1458, in forward
    conditional_pixel_values=conditional_pixel_values,
  File "/media/admin1/envs/anaconda3/envs/leng_lip/lib/python3.7/site-packages/transformers/models/clipseg/modeling_clipseg.py", line 1360, in get_conditional_embeddings
    raise ValueError("Make sure to pass as many prompt texts as there are query images")
ValueError: Make sure to pass as many prompt texts as there are query images

能成功100多张图片,然后就会出现这样的报错停止。
使用命令如下

python scripts/main_ssa_engine.py --data_dir=data/UCM_Captions --out_dir=output --world_size=4 --save_img --sam --ckpt_path=../../mydata/sam_vit_h_4b8939.pth --light_mode
@chenjixuan20
Copy link

遇到了同样的问题,请问楼主解决了么

@Yaojun-Lai
Copy link

我也遇到了同样的问题,请问有解决方法吗?

@Jiaqi-Chen-00
Copy link
Collaborator

I suggest adding a breakpoint before this line of code: https://github.com/fudan-zvg/Semantic-Segment-Anything/blob/main/scripts/pipeline.py#L95 . Check the shapes of patch_huge and mask_categories because the error message indicates a mismatch between the two. It is highly likely that the number of texts is 0 or too few. In such cases, additional preprocessing should be added before line 95 to handle this specific situation.

@xxxwuwq
Copy link

xxxwuwq commented Jun 19, 2023

I have encountered the same issue. The problem occurs in the clipseg_segmentation function in the Semantic-Segment-Anything/scripts/clipseg.py file. When class_list contains only one class, it appears as a string instead of a list. Additionally, when there is only a single class, the output of clipseg_model's logits will have a shape of (H, W), which causes dimension inconsistency when using F.interpolate for scaling. Therefore, some modifications are needed. Here is the modified code:

def clipseg_segmentation(image, class_list, clipseg_processor, clipseg_model, rank):
    if isinstance(class_list, str):
        class_list = [class_list, ]
    inputs = clipseg_processor(
        text=class_list, images=[image] * len(class_list),
        padding=True, return_tensors="pt").to(rank)
    # resize inputs['pixel_values'] to the longest side of inputs['pixel_values']
    h, w = inputs['pixel_values'].shape[-2:]
    fixed_scale = (512, 512)
    inputs['pixel_values'] = F.interpolate(
        inputs['pixel_values'],
        size=fixed_scale,
        mode='bilinear',
        align_corners=False)
    outputs = clipseg_model(**inputs)
    try:
        logits = F.interpolate(outputs.logits[None], size=(h, w), mode='bilinear', align_corners=False)[0]
    except Exception as e:
        logits = F.interpolate(outputs.logits[None, None, ...], size=(h, w), mode='bilinear', align_corners=False)[0]
    return logits

This modification includes converting class_list from a string to a list when it is of type str. It also ensures consistent dimensions when using F.interpolate by handling the case when clipseg_model's logits have a shape of (H, W).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants