Instance Segmentation Mask/Bbox Relation #1784

FrsECM · 2024-06-11T12:41:26Z

Describe the bug

I work on a usecase of instance segmentation with torchvision. In this case, i have :

image
bboxes
masks
labels
I'have created an augmentation set with albumentation, something like that :

def augmentations(instance_item:dict):
    transform = A.Compose([
        A.LongestMaxSize(max_size=MAX_SIZE),
        A.PadIfNeeded(
                min_height=MIN_IMG_HEIGHT,
                min_width=MIN_IMG_WIDTH,
                border_mode=cv2.BORDER_CONSTANT,
                value=0,
                always_apply=True),
        A.HorizontalFlip(),
        A.RandomCrop(MIN_IMG_HEIGHT,MIN_IMG_WIDTH),
        A.ToFloat(max_value=255),
        ToTensorV2()
    ],
        bbox_params=A.BboxParams(format='pascal_voc',label_fields=['labels'],min_visibility=BBOX_MIN_VISIBILITY),
        is_check_shapes=False
    )
    output = transform(
        image=instance_item['image'],
        masks=instance_item['masks'],
        bboxes=instance_item['boxes'],
        labels=instance_item['labels'])
    return output

I would expect that the parameter related to the visibility is applyable at the instance level.
I mean if the bbox is not enough visible, it should remove the corresponding mask.

I tried adding "masks" in the bbox_params :

A.BboxParams(format='pascal_voc',label_fields=['labels','masks'],min_visibility=BBOX_MIN_VISIBILITY),

but in that case it does not apply augmentation on the masks.

To Reproduce

In order to reproduce, you can use the code bellow :

import albumentations as A
import numpy as np

img = np.zeros((2048, 2048, 3), dtype=np.uint8)

# Format des bbox : [x_min, y_min, x_max, y_max]
bboxes = [
    [800, 800, 1200, 1200],  # bbox à l'intérieur du recadrage
    [1500, 1500, 1700, 1700] # bbox à l'extérieur du recadrage
]
labels = [0,1]

masks = [
    np.zeros((2048, 2048), dtype=np.uint8),
    np.zeros((2048, 2048), dtype=np.uint8)
]
masks[0][800:1200, 800:1200] = 1
masks[1][1500:1700, 1500:1700] = 1

aug = A.Compose(
    [
        A.CenterCrop(1024, 1024)
    ],
    bbox_params=A.BboxParams(format='pascal_voc', label_fields=['labels'],min_visibility=0.3),
)

# Apply Transformation
augmented = aug(image=img, bboxes=bboxes, masks=masks, labels=[0, 1])

# Get back results
print(len(augmented['bboxes']))
print(len(augmented['labels']))
print(len(augmented['masks']))
# Returns
1
1
2 => Should be 1

Expected behavior

I would expect that the parameter related to the visibility is applyable at the instance level.
I mean if the bbox is not enough visible, it should remove the corresponding mask.

Actual behavior

If i do not add masks to label_fields, i have inconsistency.
If i do add masks to label_fields, augmentations are not applyed to it.

FrsECM · 2024-06-11T15:45:19Z

A workarround i've find :

def iris2_training(segmentation_item:dict):
    transform = A.Compose([
        A.LongestMaxSize(max_size=MAX_SIZE),
        A.PadIfNeeded(
                min_height=MIN_IMG_HEIGHT,
                min_width=MIN_IMG_WIDTH,
                border_mode=cv2.BORDER_CONSTANT,
                value=0,
                always_apply=True),
        A.HorizontalFlip(),
        A.RandomCrop(MIN_IMG_HEIGHT,MIN_IMG_WIDTH),
        A.ToFloat(max_value=255),
        ToTensorV2()
    ],
        bbox_params=A.BboxParams(format='pascal_voc',label_fields=['labels','ids'],min_visibility=BBOX_MIN_VISIBILITY),
        is_check_shapes=False
    )
    output = transform(
        image=segmentation_item['image'],
        masks=segmentation_item['masks'],
        bboxes=segmentation_item['boxes'],
        labels=segmentation_item['labels'],
        ids=range(len(segmentation_item['labels']))
    )

    return dict(
        image=output['image'],
        boxes=output['bboxes'],
        labels=output['labels'],
        masks=[output['masks'][i] for i in output['ids']],
        name=segmentation_item['name']
        )

But it should be working without it.

ternaus · 2024-06-11T19:11:09Z

Thanks for the proposed solution!

Yep, we do have this issue that masks, boxes and keypoints and not binded on the instance level.

#1716

Your approach is the best that I have seen so far for this problem.

simonebonato · 2024-07-04T08:19:12Z

Also came here with the same issue.

Thanks for the workaround people :) Although should be expected from such a library to have that by default or at least being able to add it

ternaus · 2024-07-04T13:04:41Z

@simonebonato how much would you be willing to donate to help to make this happen?

https://github.com/sponsors/albumentations-team

simonebonato · 2024-07-04T13:41:01Z

I can maybe try to solve it myself if I have time.
I suppose the code is already there since it's already working with the labels.

FrsECM · 2024-07-04T14:57:25Z

Just to keep you aware....
If you are doing instance segmentation, you should be carefull with using original bboxes.

Now, i just apply augmentation to masks and then recompute the bbox coordinates. For me it makes more sense because this way the bbox will match the final augmented mask.

To do this, you can use pycocotools.

import pycocotools.mask as mask_utils
import numpy as np

def mask_to_bbox(mask:np.ndarray)->np.ndarray:
    """Convert a mask to a bbox.
    it is usefull when we apply augmentation on a mask and we would like a precised bbox corresponding to 
    the transformed mask.

    Args:
        mask (np.ndarray): _description_
    """
    mask_rle = mask_utils.encode(np.asfortranarray(mask>0))
    bbox_xywh = mask_utils.toBbox(mask_rle)
    # We convert it to a bbox xyxy
    bbox_xyxy = (bbox_xywh+np.array([0,0,bbox_xywh[0],bbox_xywh[1]]))
    return bbox_xyxy

...
transform_output = transform(
        image=segmentation_item['image'],
        masks=segmentation_item['masks']
    )
transform_output['boxes']=[mask_to_bbox(mask) for mask in masks]

ternaus · 2024-07-04T15:42:10Z

Yep, how to rotate bounding boxes so that boxes stay tight is an open question. Recomputing masks at the end was always the way to go.

We do have function for it

albumentations/albumentations/augmentations/functional.py

Line 896 in dae9bd6

def bbox_from_mask(mask: np.ndarray) -> tuple[int, int, int, int]:

def bbox_from_mask(mask: np.ndarray) -> tuple[int, int, int, int]:
    """Create bounding box from binary mask (fast version)

    Args:
        mask (numpy.ndarray): binary mask.

    Returns:
        tuple: A bounding box tuple `(x_min, y_min, x_max, y_max)`.

    """
    rows = np.any(mask, axis=1)
    if not rows.any():
        return -1, -1, -1, -1
    cols = np.any(mask, axis=0)
    y_min, y_max = np.where(rows)[0][[0, -1]]
    x_min, x_max = np.where(cols)[0][[0, -1]]
    return x_min, y_min, x_max + 1, y_max + 1

FrsECM added the bug Something isn't working label Jun 11, 2024

ternaus mentioned this issue Jun 27, 2024

Add copy paste #1820

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Instance Segmentation Mask/Bbox Relation #1784

Instance Segmentation Mask/Bbox Relation #1784

FrsECM commented Jun 11, 2024

FrsECM commented Jun 11, 2024

ternaus commented Jun 11, 2024

simonebonato commented Jul 4, 2024 •

edited

Loading

ternaus commented Jul 4, 2024

simonebonato commented Jul 4, 2024

FrsECM commented Jul 4, 2024

ternaus commented Jul 4, 2024

Instance Segmentation Mask/Bbox Relation #1784

Instance Segmentation Mask/Bbox Relation #1784

Comments

FrsECM commented Jun 11, 2024

Describe the bug

To Reproduce

Expected behavior

Actual behavior

FrsECM commented Jun 11, 2024

ternaus commented Jun 11, 2024

simonebonato commented Jul 4, 2024 • edited Loading

ternaus commented Jul 4, 2024

simonebonato commented Jul 4, 2024

FrsECM commented Jul 4, 2024

ternaus commented Jul 4, 2024

simonebonato commented Jul 4, 2024 •

edited

Loading