Sometimes fails to meet pre_nms_topk with only two classes #23

td-anne · 2023-07-08T18:51:13Z

I am running DETA on a data set with only one real class (and one N/A class; in particular various tensors are n by 2). In some long runs, the run fails with RuntimeError: selected index k out of range at the line below:

DETA/models/deformable_transformer.py

Line 188 in 985fa0b

    
           pre_nms_inds.append(torch.topk(prop_logits_b.sigmoid() * lvl_mask, pre_nms_topk)[1])

If I understand correctly, this should only be failing if the number k requested from topk, in this case pre_nms_topk, which is 1000, is too small; specifically I believe this can only happen if the length of the lvl_mask is less than 1000. (Perhaps my data augmentation has produced an unreasonably tiny image? I thought they were all rescaled.) I don't really understand where we are in the code when this occurs, but would it be harmful to trim the k supplied to topk down to the available length?

The text was updated successfully, but these errors were encountered:

td-anne · 2023-07-10T09:46:48Z

In fact I think I may know what has happened. First, I have set the input image rescaling to at most 800 for the longest side (1333 overflows my GPU RAM when images need to be padded out to 1333x1333). Second, my image augmentation (using albumentations.BBoxSafeRandomCrop) may, rarely, produce one-pixel-wide images. If these are rescaled to produce 800x1 images, then there aren't more than 800 values in lvl_mask. Does this sound plausible?

jozhang97 · 2023-09-15T19:20:15Z

Yes, if you have fewer classes, it makes sense to have fewer predictions. It should be fine to change the class-agnostic topk. We tried a couple values and did not find too much of a difference.

Your 800x1 images could also be a problem. Though there could be more proposals since we have multi-level features.

You can also try out checkpointing to avoid GPU OOM.

td-anne · 2023-09-18T08:14:14Z

The 800x1 images are, obviously, not of any use, so I don't care what values get returned as long as it doesn't crash. The checkpointing is interesting, though: could the model cope with 1920 by 1080 images? Or does that require changing the structure somewhat? My raw inputs are all 1920 by 1080 and I'm looking for broken wires, which might disappear when downscaled. For the moment I'm more interested in accuracy than speed.

jozhang97 · 2023-09-22T19:02:48Z

I see that makes sense for high resolution. We typically use larger images during pre-training so I don't think 1920x1080 should be a problem.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sometimes fails to meet pre_nms_topk with only two classes #23

Sometimes fails to meet pre_nms_topk with only two classes #23

td-anne commented Jul 8, 2023

td-anne commented Jul 10, 2023

jozhang97 commented Sep 15, 2023

td-anne commented Sep 18, 2023

jozhang97 commented Sep 22, 2023

Sometimes fails to meet pre_nms_topk with only two classes #23

Sometimes fails to meet pre_nms_topk with only two classes #23

Comments

td-anne commented Jul 8, 2023

td-anne commented Jul 10, 2023

jozhang97 commented Sep 15, 2023

td-anne commented Sep 18, 2023

jozhang97 commented Sep 22, 2023