how to use yolov11s-seg supervision onnx runtime? #1789
Replies: 30 comments 6 replies
-
here is how i am making predictions Load the model and create InferenceSessionbest_weights_path = f"{saved_model_results_path}/train/weights/best.onnx" detector = YOLOv11(best_weights_path, conf_thres=0.2, iou_thres=0.3) img = cv2.imread("/content/download (1).jpeg") Detect Objects (now returns bounding boxes, scores, class_ids, and segmentation masks)boxes, scores, class_ids, masks = detector(img) boxes array([[ 274.24, 185.68, 958.67, 689.4],
[ 244.84, 252.61, 830.42, 883.34]], dtype=float32) scores array([ 0.8895, 0.86876], dtype=float32) class_Ids array([2, 2], dtype=int32) masks array([[[255, 255, 255, ..., 255, 255, 255],
[255, 255, 255, ..., 255, 255, 255],
[255, 255, 255, ..., 255, 255, 255],
...,
[255, 255, 255, ..., 255, 255, 255],
[255, 255, 255, ..., 255, 255, 255],
[255, 255, 255, ..., 255, 255, 255]],
[[255, 255, 255, ..., 255, 255, 255],
[255, 255, 255, ..., 255, 255, 255],
[255, 255, 255, ..., 255, 255, 255],
...,
[255, 255, 255, ..., 255, 255, 255],
[255, 255, 255, ..., 255, 255, 255],
[255, 255, 255, ..., 255, 255, 255]]], dtype=uint8) |
Beta Was this translation helpful? Give feedback.
-
i have checked predicted the class id and prob it is right things that are working fine here is how it should be, below is direct prediction results with ultralytics |
Beta Was this translation helpful? Give feedback.
-
@pranta-barua007 hello, let me check mask case quickly in my collab |
Beta Was this translation helpful? Give feedback.
-
@pranta-barua007 also If you don't mind can you share your export parameters and model with me as well. |
Beta Was this translation helpful? Give feedback.
-
@onuralpszr thanks for the quick reply ! should I share it here? |
Beta Was this translation helpful? Give feedback.
-
If it is a problem you can share to my e-mail "[email protected]" via google drive |
Beta Was this translation helpful? Give feedback.
-
@onuralpszr please do check i have shared |
Beta Was this translation helpful? Give feedback.
-
Export parameters also please ? |
Beta Was this translation helpful? Give feedback.
-
@onuralpszr can you please explain here is how i am exporting ft_loaded_best_model.export(
format="onnx",
nms=True,
data="/content/disease__instance_segmented/data.yaml",
) # creates 'best.onnx' I got params from here https://docs.ultralytics.com/modes/export/#arguments |
Beta Was this translation helpful? Give feedback.
-
I got what I needed all good :) |
Beta Was this translation helpful? Give feedback.
-
@pranta-barua007 can you also upload original picture you used |
Beta Was this translation helpful? Give feedback.
-
@pranta-barua007 also one "train" data picture would be also great for testing purpose as well |
Beta Was this translation helpful? Give feedback.
-
ok @onuralpszr uploading on the shared folder |
Beta Was this translation helpful? Give feedback.
-
@onuralpszr shared some information currently train data is not available can you please check if the shared resources help ? |
Beta Was this translation helpful? Give feedback.
-
I have tried exporting model in 3 different ways and inspected difference in outputshape (my default is OPTION - 3) without DYNAMIC and NMS -- OPTION 1from ultralytics import YOLO
# Load a model
best_weights_path = f"{saved_model_results_path}/train/weights/without_nms_and_dynamic/best.pt"
ft_loaded_best_model = YOLO(best_weights_path)
ft_loaded_best_model.export(
format="onnx",
data="/content/disease_instance_segmented/data.yaml"
) Ultralytics 8.3.75 🚀 Python-3.11.11 torch-2.5.1+cu124 CPU (Intel Xeon 2.20GHz)
YOLO11s-seg summary (fused): 265 layers, 10,068,364 parameters, 0 gradients, 35.3 GFLOPs
PyTorch: starting from '/content/drive/MyDrive/ML/DENTAL_THESIS/fine_tuned/segment/train/weights/without_nms_and_dynamic/best.pt' with
** input shape (1, 3, 640, 640) BCHW **
** and output shape(s) ((1, 40, 8400), (1, 32, 160, 160)) (19.6 MB) **
ONNX: starting export with onnx 1.17.0 opset 19...
ONNX: slimming with onnxslim 0.1.48...
ONNX: export success ✅ 2.6s, saved as '/content/drive/MyDrive/ML/DENTAL_THESIS/fine_tuned/segment/train/weights/without_nms_and_dynamic/best.onnx' (38.7 MB)
Export complete (4.1s)
Results saved to /content/drive/MyDrive/ML/DENTAL_THESIS/fine_tuned/segment/train/weights/without_nms_and_dynamic
Predict: yolo predict task=segment model=/content/drive/MyDrive/ML/DENTAL_THESIS/fine_tuned/segment/train/weights/without_nms_and_dynamic/best.onnx imgsz=640
Validate: yolo val task=segment model=/content/drive/MyDrive/ML/DENTAL_THESIS/fine_tuned/segment/train/weights/without_nms_and_dynamic/best.onnx imgsz=640 data=/content/dental_disease__instance_segmented-7/data.yaml
Visualize: https://netron.app/
/content/drive/MyDrive/ML/DENTAL_THESIS/fine_tuned/segment/train/weights/without_nms_and_dynamic/best.onnx with DYNAMIC and NMS -- OPTION 2from ultralytics import YOLO
# Load a model
best_weights_path = f"{saved_model_results_path}/train/weights/with_dynamic_and_nms/best.pt"
ft_loaded_best_model = YOLO(best_weights_path)
ft_loaded_best_model.export(
format="onnx",
nms=True,
dynamic=True,
data="/content/disease_instance_segmented/data.yaml"
) Ultralytics 8.3.75 🚀 Python-3.11.11 torch-2.5.1+cu124 CPU (Intel Xeon 2.20GHz)
YOLO11s-seg summary (fused): 265 layers, 10,068,364 parameters, 0 gradients, 35.3 GFLOPs
PyTorch: starting from '/content/drive/MyDrive/ML/DENTAL_THESIS/fine_tuned/segment/train/weights/with_dynamic_and_nms/best.pt' with
** input shape (1, 3, 640, 640) BCHW **
** and output shape(s) ((1, 300, 38), (1, 32, 160, 160)) (19.6 MB) **
ONNX: starting export with onnx 1.17.0 opset 19...
ONNX: slimming with onnxslim 0.1.48...
ONNX: export success ✅ 35.8s, saved as '/content/drive/MyDrive/ML/DENTAL_THESIS/fine_tuned/segment/train/weights/with_dynamic_and_nms/best.onnx' (38.6 MB)
Export complete (42.2s)
Results saved to /content/drive/MyDrive/ML/DENTAL_THESIS/fine_tuned/segment/train/weights/with_dynamic_and_nms
Predict: yolo predict task=segment model=/content/drive/MyDrive/ML/DENTAL_THESIS/fine_tuned/segment/train/weights/with_dynamic_and_nms/best.onnx imgsz=640
Validate: yolo val task=segment model=/content/drive/MyDrive/ML/DENTAL_THESIS/fine_tuned/segment/train/weights/with_dynamic_and_nms/best.onnx imgsz=640 data=/content/dental_disease__instance_segmented-7/data.yaml
Visualize: https://netron.app/
/content/drive/MyDrive/ML/DENTAL_THESIS/fine_tuned/segment/train/weights/with_dynamic_and_nms/best.onnx without DYNAMIC -- OPTION 3 (DEFAULT)from ultralytics import YOLO
# Load a model
best_weights_path = f"{saved_model_results_path}/train/weights/best.pt"
ft_loaded_best_model = YOLO(best_weights_path)
ft_loaded_best_model.export(
format="onnx",
nms=True,
data="/content/disease_instance_segmented/data.yaml"
) Ultralytics 8.3.75 🚀 Python-3.11.11 torch-2.5.1+cu124 CPU (Intel Xeon 2.20GHz)
YOLO11s-seg summary (fused): 265 layers, 10,068,364 parameters, 0 gradients, 35.3 GFLOPs
PyTorch: starting from '/content/drive/MyDrive/ML/DENTAL_THESIS/fine_tuned/segment/train/weights/best.pt' with
** input shape (1, 3, 640, 640) BCHW **
** and output shape(s) ((1, 300, 38), (1, 32, 160, 160)) (19.6 MB) **
ONNX: starting export with onnx 1.17.0 opset 19...
ONNX: slimming with onnxslim 0.1.48...
ONNX: export success ✅ 4.4s, saved as '/content/drive/MyDrive/ML/DENTAL_THESIS/fine_tuned/segment/train/weights/best.onnx' (38.7 MB)
Export complete (6.1s)
Results saved to /content/drive/MyDrive/ML/DENTAL_THESIS/fine_tuned/segment/train/weights
Predict: yolo predict task=segment model=/content/drive/MyDrive/ML/DENTAL_THESIS/fine_tuned/segment/train/weights/best.onnx imgsz=640
Validate: yolo val task=segment model=/content/drive/MyDrive/ML/DENTAL_THESIS/fine_tuned/segment/train/weights/best.onnx imgsz=640 data=/content/dental_disease__instance_segmented-7/data.yaml
Visualize: https://netron.app/
/content/drive/MyDrive/ML/DENTAL_THESIS/fine_tuned/segment/train/weights/best.onnx |
Beta Was this translation helpful? Give feedback.
-
@onuralpszr class YOLOv11:
def __init__(self, path, conf_thres=0.7, iou_thres=0.5):
self.conf_threshold = conf_thres
self.iou_threshold = iou_thres
self.initialize_model(path)
def __call__(self, image):
return self.detect_objects(image)
def initialize_model(self, path):
self.session = onnxruntime.InferenceSession(
path, providers=onnxruntime.get_available_providers()
)
self.get_input_details()
self.get_output_details()
def detect_objects(self, image):
# Save original image dimensions
self.img_height, self.img_width = image.shape[:2]
# Prepare input (resize, normalize, etc.)
input_tensor = self.prepare_input(image)
# Run inference
outputs = self.inference(input_tensor)
# Process outputs into boxes, scores, class IDs, and masks
boxes, scores, class_ids, masks = self.process_output(outputs)
return boxes, scores, class_ids, masks
def prepare_input(self, image):
# Convert BGR to RGB and resize to model input size (e.g. 640x640)
input_img = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
input_img = cv2.resize(input_img, (self.input_width, self.input_height))
# Normalize to [0, 1]
input_img = input_img / 255.0
# Change data layout from HWC to CHW
input_img = input_img.transpose(2, 0, 1)
# Add batch dimension: [1, C, H, W]
input_tensor = input_img[np.newaxis, :, :, :].astype(np.float32)
return input_tensor
def inference(self, input_tensor):
outputs = self.session.run(self.output_names, {self.input_names[0]: input_tensor})
return outputs
def process_output(self, outputs):
"""
Model outputs:
- outputs[0]: shape (1, 300, 38)
indices 0-3: bounding box (assumed [x1, y1, x2, y2] in 640×640 space)
index 4: confidence score
index 5: class id
indices 6-37: segmentation coefficients (32 values)
- outputs[1]: shape (1, 38, 160, 160) -> mask prototypes
"""
# Remove batch dimension from detections (results in (300, 38))
predictions = np.squeeze(outputs[0], axis=0)
mask_protos = outputs[1] # shape: (1, 38, 160, 160)
# Filter detections based on confidence (index 4)
conf_scores = predictions[:, 4]
valid = conf_scores > self.conf_threshold
predictions = predictions[valid]
scores = conf_scores[valid]
if len(scores) == 0:
return [], [], [], []
# Extract bounding boxes (assumed already in [x1, y1, x2, y2] format)
boxes = self.extract_boxes(predictions)
# Extract class ids (index 5)
class_ids = predictions[:, 5].astype(np.int32)
# Extract segmentation masks using coefficients (indices 6-37)
masks = self.extract_masks(predictions, mask_protos)
return boxes, scores, class_ids, masks
def extract_boxes(self, predictions):
# Get the first 4 values; these are assumed to be [x1, y1, x2, y2] in 640×640 space.
boxes = predictions[:, :4]
# If the original image size differs from the model input size,
# rescale boxes from (self.input_width, self.input_height) to (self.img_width, self.img_height)
if (self.img_width != self.input_width) or (self.img_height != self.input_height):
boxes = self.rescale_boxes_corner_format(boxes)
return boxes
def rescale_boxes_corner_format(self, boxes):
# Calculate scaling factors from model input size to original image size
scale_x = float(self.img_width) / self.input_width
scale_y = float(self.img_height) / self.input_height
boxes[:, [0, 2]] *= scale_x # x1, x2
boxes[:, [1, 3]] *= scale_y # y1, y2
return boxes
def extract_masks(self, predictions, mask_protos):
"""
Compute segmentation masks:
- For each detection, use the 32 segmentation coefficients (indices 6-37)
to compute a weighted sum over the first 32 channels of the mask prototypes.
- The mask prototypes have shape (1, 38, 160, 160), so we select the first 32 channels.
"""
# Extract segmentation coefficients (shape: (num_detections, 32))
seg_coeffs = predictions[:, 6:38]
# Use the first 32 channels from mask prototypes; remove batch dimension → (32, 160, 160)
mask_protos = mask_protos[0, :32, :, :]
# Compute masks as a weighted sum of mask prototypes for each detection
masks = np.einsum('nc,chw->nhw', seg_coeffs, mask_protos)
# Apply sigmoid to obtain probabilities between 0 and 1
masks = 1.0 / (1.0 + np.exp(-masks))
# Binarize masks with a threshold of 0.5
masks = masks > 0.5
# Resize each mask from 160x160 (mask prototype resolution) to the original image dimensions
final_masks = []
for mask in masks:
mask_uint8 = mask.astype(np.uint8) * 255
mask_resized = cv2.resize(mask_uint8,
(self.img_width, self.img_height),
interpolation=cv2.INTER_NEAREST)
final_masks.append(mask_resized)
final_masks = np.array(final_masks)
return final_masks
def get_input_details(self):
model_inputs = self.session.get_inputs()
self.input_names = [inp.name for inp in model_inputs]
self.input_shape = model_inputs[0].shape # typically [1, 3, 640, 640]
self.input_height = self.input_shape[2]
self.input_width = self.input_shape[3]
def get_output_details(self):
model_outputs = self.session.get_outputs()
self.output_names = [out.name for out in model_outputs] output # Load the model and create InferenceSession
best_weights_path = f"{saved_model_results_path}/train/weights/best.onnx"
detector = YOLOv11(best_weights_path, conf_thres=0.4, iou_thres=0.4)
img = cv2.imread("/content/download (1).jpeg")
# Detect Objects (now returns bounding boxes, scores, class_ids, and segmentation masks)
boxes, scores, class_ids, masks = detector(img)
# 1) Print bounding box coordinates to confirm they're in-image
print("Bounding boxes:\n", detections.xyxy)
# 2) If you have masks, print their shape and check if non-empty
if detections.mask is not None and len(detections.mask) > 0:
print("Mask shape:", detections.mask.shape)
print("Mask unique values:", np.unique(detections.mask[0])) # e.g., [0, 255]
# 3) Create annotators with explicit colors/thickness
mask_annotator = sv.MaskAnnotator(
# By default, it uses random colors. You can force a single color if desired:
color=sv.Color.GREEN
)
box_annotator = sv.BoxAnnotator(
thickness=2 # thicker line
)
label_annotator = sv.LabelAnnotator(
text_scale=0.7,
text_thickness=2
)
# 4) Draw them in the recommended order: masks → boxes → labels
annotated_image = img.copy()
annotated_image = mask_annotator.annotate(scene=annotated_image, detections=detections)
annotated_image = box_annotator.annotate(scene=annotated_image, detections=detections)
annotated_image = label_annotator.annotate(scene=annotated_image, detections=detections)
# 5) Display the result
sv.plot_image(annotated_image)
cv2.imwrite("debug_output.jpg", annotated_image) output Bounding boxes:
[[ 619.21 436.77 686.61 506.29]
[ 536.22 568.54 582.89 634.71]]
Mask shape: (2, 900, 1200)
Mask unique values: [ 0 255]
True |
Beta Was this translation helpful? Give feedback.
-
I also made my update on colab to also make mask work as well. Can you check same collab please |
Beta Was this translation helpful? Give feedback.
-
@onuralpszr colab on #1626 ? |
Beta Was this translation helpful? Give feedback.
-
yes |
Beta Was this translation helpful? Give feedback.
-
@onuralpszr getting error on this line boolean_mask = masks.astype(bool) ---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
[<ipython-input-92-d88d72348bb6>](https://localhost:8080/#) in <cell line: 0>()
----> 1 boolean_mask = masks.astype(bool)
AttributeError: 'list' object has no attribute 'astype' |
Beta Was this translation helpful? Give feedback.
-
Convert masks to np.array instead of list but I pressume you got empty mask list ? |
Beta Was this translation helpful? Give feedback.
-
I am converting this to discussion |
Beta Was this translation helpful? Give feedback.
-
@onuralpszr i am getting everythink empty now box, mask, class_ids, scores [] !! |
Beta Was this translation helpful? Give feedback.
-
just to be clear i am exporting my model like this from ultralytics import YOLO
# Load a model
best_weights_path = f"{saved_model_results_path}/train/weights/best.pt"
ft_loaded_best_model = YOLO(best_weights_path)
ft_loaded_best_model.export(
format="onnx",
nms=True,
data="/content/dental_disease__instance_segmented-9/data.yaml"
) OUTPUT Ultralytics 8.3.75 🚀 Python-3.11.11 torch-2.5.1+cu124 CPU (Intel Xeon 2.20GHz)
YOLO11s-seg summary (fused): 265 layers, 10,068,364 parameters, 0 gradients, 35.3 GFLOPs
PyTorch: starting from '/content/drive/MyDrive/ML/DENTAL_THESIS/fine_tuned/segment/train/weights/best.pt' with
⚠️⚠️⚠️
input shape (1, 3, 640, 640) BCHW and
output shape(s) ((1, 300, 38), (1, 32, 160, 160)) (19.6 MB)
⚠️⚠️⚠️
ONNX: starting export with onnx 1.17.0 opset 19...
ONNX: slimming with onnxslim 0.1.48...
ONNX: export success ✅ 5.8s, saved as '/content/drive/MyDrive/ML/DENTAL_THESIS/fine_tuned/segment/train/weights/best.onnx' (38.7 MB)
Export complete (7.5s)
Results saved to /content/drive/MyDrive/ML/DENTAL_THESIS/fine_tuned/segment/train/weights
Predict: yolo predict task=segment model=/content/drive/MyDrive/ML/DENTAL_THESIS/fine_tuned/segment/train/weights/best.onnx imgsz=640
Validate: yolo val task=segment model=/content/drive/MyDrive/ML/DENTAL_THESIS/fine_tuned/segment/train/weights/best.onnx imgsz=640 data=/content/dental_disease__instance_segmented-9/data.yaml
Visualize: https://netron.app/
/content/drive/MyDrive/ML/DENTAL_THESIS/fine_tuned/segment/train/weights/best.onnx highlighted the I/O shapes above with |
Beta Was this translation helpful? Give feedback.
-
@onuralpszr UPDTAE working ✅ IF exporting ONNX without
Ultralytics 8.3.75 🚀 Python-3.11.11 torch-2.5.1+cu124 CPU (Intel Xeon 2.20GHz)
YOLO11s-seg summary (fused): 265 layers, 10,068,364 parameters, 0 gradients, 35.3 GFLOPs
PyTorch: starting from '/content/drive/MyDrive/ML/DENTAL_THESIS/fine_tuned/segment/train/weights/best.pt' with
⚠️⚠️⚠️
input shape (1, 3, 640, 640) BCHW and
output shape(s) ((1, 40, 8400), (1, 32, 160, 160)) (19.6 MB)
⚠️⚠️⚠️
ONNX: starting export with onnx 1.17.0 opset 19...
ONNX: slimming with onnxslim 0.1.48...
ONNX: export success ✅ 3.1s, saved as '/content/drive/MyDrive/ML/DENTAL_THESIS/fine_tuned/segment/train/weights/best.onnx' (38.7 MB)
Export complete (5.1s)
Results saved to /content/drive/MyDrive/ML/DENTAL_THESIS/fine_tuned/segment/train/weights
Predict: yolo predict task=segment model=/content/drive/MyDrive/ML/DENTAL_THESIS/fine_tuned/segment/train/weights/best.onnx imgsz=640
Validate: yolo val task=segment model=/content/drive/MyDrive/ML/DENTAL_THESIS/fine_tuned/segment/train/weights/best.onnx imgsz=640 data=/content/dental_disease__instance_segmented-9/data.yaml
Visualize: https://netron.app/
/content/drive/MyDrive/ML/DENTAL_THESIS/fine_tuned/segment/train/weights/best.onnx |
Beta Was this translation helpful? Give feedback.
-
with input shape (1, 3, 640, 640) BCHW and without input shape (1, 3, 640, 640) BCHW and does applying |
Beta Was this translation helpful? Give feedback.
-
@onuralpszr i have figured how to do it with ✅ NMS enabled exporting ONNX format from ultralytics import YOLO
# Load a model
best_weights_path = f"{saved_model_results_path}/train/weights/best.pt"
ft_loaded_best_model = YOLO(best_weights_path)
ft_loaded_best_model.export(
format="onnx",
nms=True,
data="/content/dental_disease__instance_segmented-9/data.yaml"
) Ultralytics 8.3.76 🚀 Python-3.11.11 torch-2.5.1+cu124 CPU (Intel Xeon 2.20GHz)
YOLO11s-seg summary (fused): 113 layers, 10,068,364 parameters, 0 gradients, 35.3 GFLOPs
PyTorch: starting from '/content/drive/MyDrive/ML/DENTAL_THESIS/fine_tuned/segment/train/weights/best.pt' with
input shape (1, 3, 640, 640) BCHW and
output shape(s) ((1, 300, 38), (1, 32, 160, 160)) (19.6 MB)
ONNX: starting export with onnx 1.17.0 opset 19...
ONNX: slimming with onnxslim 0.1.48...
ONNX: export success ✅ 9.0s, saved as '/content/drive/MyDrive/ML/DENTAL_THESIS/fine_tuned/segment/train/weights/best.onnx' (38.7 MB)
Export complete (11.8s)
Results saved to /content/drive/MyDrive/ML/DENTAL_THESIS/fine_tuned/segment/train/weights
Predict: yolo predict task=segment model=/content/drive/MyDrive/ML/DENTAL_THESIS/fine_tuned/segment/train/weights/best.onnx imgsz=640
Validate: yolo val task=segment model=/content/drive/MyDrive/ML/DENTAL_THESIS/fine_tuned/segment/train/weights/best.onnx imgsz=640 data=/content/dental_disease__instance_segmented-9/data.yaml
Visualize: https://netron.app/
/content/drive/MyDrive/ML/DENTAL_THESIS/fine_tuned/segment/train/weights/best.onnx Yolo11s-seg with nms exported ONNXimport cv2
import numpy as np
import onnxruntime
import math
import time
import supervision as sv
def sigmoid(x):
return 1 / (1 + np.exp(-x))
class YOLOv11nms:
def __init__(self, path, conf_thres=0.4, num_masks=32):
"""
Args:
path (str): Path to the exported ONNX model.
conf_thres (float): Confidence threshold for filtering detections.
num_masks (int): Number of mask coefficients (should match export, e.g., 32).
"""
self.conf_threshold = conf_thres
self.num_masks = num_masks
self.initialize_model(path)
def initialize_model(self, path):
# Create ONNX Runtime session with GPU (if available) or CPU.
self.session = onnxruntime.InferenceSession(
path, providers=['CUDAExecutionProvider', 'CPUExecutionProvider']
)
self.get_input_details()
self.get_output_details()
def get_input_details(self):
model_inputs = self.session.get_inputs()
self.input_names = [inp.name for inp in model_inputs]
self.input_shape = model_inputs[0].shape # Expected shape: (1, 3, 640, 640)
self.input_height = self.input_shape[2]
self.input_width = self.input_shape[3]
def get_output_details(self):
model_outputs = self.session.get_outputs()
self.output_names = [out.name for out in model_outputs]
def prepare_input(self, image):
# Record the original image dimensions.
self.img_height, self.img_width = image.shape[:2]
# Convert BGR (OpenCV format) to RGB.
img = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
# Resize to the model’s input size (e.g., 640x640).
img = cv2.resize(img, (self.input_width, self.input_height))
# Normalize pixel values to [0, 1].
img = img.astype(np.float32) / 255.0
# Convert from HWC to CHW format.
img = img.transpose(2, 0, 1)
# Add batch dimension: shape becomes (1, 3, 640, 640).
input_tensor = np.expand_dims(img, axis=0)
return input_tensor
def inference(self, input_tensor):
outputs = self.session.run(self.output_names, {self.input_names[0]: input_tensor})
return outputs
def segment_objects(self, image):
"""
Processes an image and returns:
- boxes: Bounding boxes (rescaled to original image coordinates).
- scores: Confidence scores.
- class_ids: Detected class indices.
- masks: Binary segmentation masks (aligned with the original image).
"""
# Preprocess the image.
input_tensor = self.prepare_input(image)
outputs = self.inference(input_tensor)
# Process detection output.
# Detection output shape is (1, 300, 38) (post-NMS & transposed).
detections = np.squeeze(outputs[0], axis=0) # Now shape: (300, 38)
# Filter out detections below the confidence threshold.
valid_mask = detections[:, 4] > self.conf_threshold
detections = detections[valid_mask]
if detections.shape[0] == 0:
return np.array([]), np.array([]), np.array([]), np.array([])
# Extract detection results.
# boxes_model: boxes in model input coordinates (e.g., in a 640x640 space)
boxes_model = detections[:, :4] # Format: (x1, y1, x2, y2)
scores = detections[:, 4]
class_ids = detections[:, 5].astype(np.int64)
mask_coeffs = detections[:, 6:] # 32 mask coefficients
# Rescale boxes for final drawing on the original image.
boxes_draw = self.rescale_boxes(
boxes_model,
(self.input_height, self.input_width),
(self.img_height, self.img_width)
)
# Process the mask output using the boxes in model coordinates.
masks = self.process_mask_output(mask_coeffs, outputs[1], boxes_model)
return boxes_draw, scores, class_ids, masks
def process_mask_output(self, mask_coeffs, mask_feature_map, boxes_model):
"""
Generates segmentation masks for each detection.
Args:
mask_coeffs (np.ndarray): (N, 32) mask coefficients for N detections.
mask_feature_map (np.ndarray): Output mask feature map with shape (1, 32, 160, 160).
boxes_model (np.ndarray): Bounding boxes in model input coordinates.
Returns:
mask_maps (np.ndarray): Binary masks for each detection, with shape
(N, original_img_height, original_img_width).
"""
# Squeeze the mask feature map: (1, 32, 160, 160) -> (32, 160, 160)
mask_feature_map = np.squeeze(mask_feature_map, axis=0)
# Reshape to (32, 25600) where 25600 = 160 x 160.
mask_feature_map_reshaped = mask_feature_map.reshape(self.num_masks, -1)
# Combine mask coefficients with the mask feature map.
# Resulting shape: (N, 25600) → then reshape to (N, 160, 160)
masks = sigmoid(np.dot(mask_coeffs, mask_feature_map_reshaped))
masks = masks.reshape(-1, mask_feature_map.shape[1], mask_feature_map.shape[2])
# Get mask feature map dimensions.
mask_h, mask_w = mask_feature_map.shape[1], mask_feature_map.shape[2]
# Rescale boxes from model coordinates (e.g., 640x640) to mask feature map coordinates (e.g., 160x160).
scale_boxes = self.rescale_boxes(
boxes_model,
(self.input_height, self.input_width),
(mask_h, mask_w)
)
# Also, compute boxes in original image coordinates for placing the mask.
boxes_draw = self.rescale_boxes(
boxes_model,
(self.input_height, self.input_width),
(self.img_height, self.img_width)
)
# Create an empty array for final masks with the same size as the original image.
mask_maps = np.zeros((boxes_model.shape[0], self.img_height, self.img_width), dtype=np.uint8)
# Determine blur size based on the ratio between the original image and the mask feature map.
blur_size = (
max(1, int(self.img_width / mask_w)),
max(1, int(self.img_height / mask_h))
)
for i in range(boxes_model.shape[0]):
# Get the detection box in mask feature map coordinates.
sx1, sy1, sx2, sy2 = scale_boxes[i]
sx1, sy1, sx2, sy2 = int(np.floor(sx1)), int(np.floor(sy1)), int(np.ceil(sx2)), int(np.ceil(sy2))
# Get the corresponding box in the original image.
ox1, oy1, ox2, oy2 = boxes_draw[i]
ox1, oy1, ox2, oy2 = int(np.floor(ox1)), int(np.floor(oy1)), int(np.ceil(ox2)), int(np.ceil(oy2))
# Crop the predicted mask region from the raw mask.
cropped_mask = masks[i][sy1:sy2, sx1:sx2]
if cropped_mask.size == 0 or (ox2 - ox1) <= 0 or (oy2 - oy1) <= 0:
continue
# Resize the cropped mask to the size of the detection box in the original image.
resized_mask = cv2.resize(cropped_mask, (ox2 - ox1, oy2 - oy1), interpolation=cv2.INTER_CUBIC)
# Apply a slight blur to smooth the mask edges.
resized_mask = cv2.blur(resized_mask, blur_size)
# Threshold the mask to obtain a binary mask.
bin_mask = (resized_mask > 0.5).astype(np.uint8)
# Place the binary mask into the correct location on the full mask.
mask_maps[i, oy1:oy2, ox1:ox2] = bin_mask
return mask_maps
@staticmethod
def rescale_boxes(boxes, input_shape, target_shape):
"""
Rescales boxes from one coordinate space to another.
Args:
boxes (np.ndarray): Array of boxes (N, 4) with format [x1, y1, x2, y2].
input_shape (tuple): (height, width) of the current coordinate space.
target_shape (tuple): (height, width) of the target coordinate space.
Returns:
np.ndarray: Scaled boxes of shape (N, 4).
"""
in_h, in_w = input_shape
tgt_h, tgt_w = target_shape
scale = np.array([tgt_w / in_w, tgt_h / in_h, tgt_w / in_w, tgt_h / in_h])
return boxes * scale
def __call__(self, image):
# This allows you to call the instance directly, e.g.:
# boxes, scores, class_ids, masks = detector(image)
return self.segment_objects(image) Usage# Load the model and create InferenceSession
best_weights_path = f"{saved_model_results_path}/train/weights/best.onnx"
detector = YOLOv11nms(best_weights_path, conf_thres=0.4)
img = cv2.imread("/content/download (1).jpeg")
# Detect Objects (now returns bounding boxes, scores, class_ids, and segmentation masks)
boxes, scores, class_ids, masks = detector(img)
boolean_mask = masks.astype(bool)
box_annotator = sv.BoxAnnotator()
label_annotator = sv.LabelAnnotator()
mask_annotator = sv.MaskAnnotator()
detections = sv.Detections(xyxy=boxes, confidence=scores, class_id=class_ids,mask=boolean_mask)
detections = detections.with_nms(threshold=0.5)
annotate = box_annotator.annotate(scene=img.copy(), detections=detections)
annotate = label_annotator.annotate(scene=annotate, detections=detections)
annotate = mask_annotator.annotate(scene=annotate, detections=detections)
sv.plot_image(annotate) OUTPUT |
Beta Was this translation helpful? Give feedback.
-
@onuralpszr i actually doing this for using in the web specifically with JS and later React Natvie , is https://www.npmjs.com/package/supervision this lib will release anywhere soon ? |
Beta Was this translation helpful? Give feedback.
-
@onuralpszr i am making the app on nextjs directly using JS/typescript. almost there i need some help fixing the masks, some masks are correct some arent. it would be very kind if I get some assist. |
Beta Was this translation helpful? Give feedback.
-
dear @onuralpszr i saw similar case on #1626 and tried some customization with my own usecase for segmentation but doesn't seem to properly working
here is how I am exporting my model with ultralytics
which outputs in console
I have 4 classes in my model
as I applied nms my output0 is already transposed I think
where first 4 indices are bbox. 5 is prob, 6 is class id 7 and rest 32 are mask and the 300 is for the model will detect up to 300 results, educate if my interpretation is wrong ?
here is my implementation
Beta Was this translation helpful? Give feedback.
All reactions