Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[QUESTION] Python osd_into() wont accept batch elements #237

Open
peterrrrob97 opened this issue Mar 9, 2025 · 0 comments
Open

[QUESTION] Python osd_into() wont accept batch elements #237

peterrrrob97 opened this issue Mar 9, 2025 · 0 comments
Labels
question Further information is requested

Comments

@peterrrrob97
Copy link

peterrrrob97 commented Mar 9, 2025

What is your question?

I am trying to overlay lines on video frames in a performant way using osd_into(). I am having trouble configuring the elements argument to be accepted by the function.

Docker container (Nvidia cuda 11.8 w Ubuntu 22.04)
NVIDIA rtx a5000 ada

Workflow:

VPF Decoding
YOLO inference
results processing
CVCUDA post process overlay (osd_into)
-Pytorch tensor of frames -> cvcuda tensor
-list of list for cvcuda.line -> cvcuda.elements

Here is my method to pull tensors and elements together, and give to osd_into() as arguments


def _process_with_cvcuda(self):
"""Process batches from the queue and overlay regression lines using CVCUDA OSD."""
try:
# Initialize PyCUDA Stream
self.cuda_ctx = cuda.Device(self.gpu_id).make_context()
logging.info(f"Activated CUDA context on device {self.gpu_id} for processing thread.")
cuda_stream = cuda.Stream()

        while self.running:
            try:
                # Step 1: Get a batch from the shared queue
                batch_data = self.shared_queue.get(timeout=1)

                if batch_data is None:
                    logging.info("CVCUDA processor: Received EOF marker. Stopping.")
                    break

                # Step 2: Unpack batch data
                batch_tensor, frame_ids = batch_data
                batch_size = len(frame_ids)

                # Ensure the tensor is uint8 before conversion
                batch_tensor = batch_tensor.to(torch.uint8)

                # Log Tensor Information
                logging.info(f"Processing Batch - Frame Count: {batch_size}")
                logging.info(f"   - Shape: {batch_tensor.shape}")
                logging.info(f"   - Dtype: {batch_tensor.dtype}")
                logging.info(f"   - Device: {batch_tensor.device}")
                logging.info(f"   - Memory Pointer: {batch_tensor.data_ptr()}")

                # Convert PyTorch tensor to NVCV tensor
                frames_nvcv = cvcuda.as_tensor(batch_tensor)

                # Log Converted Tensor Info
                logging.info(f"Converted frames_nvcv Tensor: {frames_nvcv}")
                logging.info(f"   - Shape: {frames_nvcv.shape}")
                logging.info(f"   - Dtype: {frames_nvcv.dtype}")

                # Step 3: Fetch & Package `cvcuda.Line` Objects for Each Frame
                batch_elements = []
                for idx, frame_id in enumerate(frame_ids):
                    regression_lines = self.gpu_regression_results.get(frame_id, [])

                    if not isinstance(regression_lines, list):
                        logging.warning(f"Frame {frame_id}: Expected list, got {type(regression_lines)}. Replacing with empty list.")
                        regression_lines = []

                    if not all(isinstance(line, cvcuda.Line) for line in regression_lines):
                        logging.error(f"Frame {frame_id}: Some elements are not cvcuda.Line objects. Using empty list.")
                        regression_lines = []

                    # Append as a nested list to ensure correct format
                    batch_elements.append(regression_lines)
                    #logging.info(f"Frame {frame_id}: Added {len(regression_lines)} line(s)")

                # Log batch elements structure
                logging.info(f"Batch Elements Length: {len(batch_elements)}")
                for i, frame_lines in enumerate(batch_elements[:3]):  # Log first 3 frames
                    logging.info(f"Frame {frame_ids[i]}: Contains {len(frame_lines)} lines.")
                    #for line in frame_lines[:3]:  # Log first 3 lines
                        #logging.info(f"  - Line Object: {line}")

                # Construct cvcuda.Elements object
                elements = cvcuda.Elements(batch_elements)
                logging.info(f"Attributes of elements: {dir(elements)}")

                logging.info("Successfully created cvcuda.Elements object.")

                # Extract Capsule from Elements
                capsule_elements = elements.capsule() if hasattr(elements, "capsule") else elements
                logging.info(f"Extracted Capsule Type: {type(capsule_elements)}")

                # Ensure CUDA stream is synchronized before OSD
                cuda_stream.synchronize()

                # Apply OSD Overlay
                cvcuda.osd_into(dst=frames_nvcv, src=frames_nvcv, elements=capsule_elements, stream=cuda_stream.handle)
                logging.info("Successfully applied OSD overlay.")

                # Convert back to PyTorch tensor
                result_tensor = torch.tensor(frames_nvcv.cuda(), dtype=torch.uint8)

                # Put processed batch in output queue
                self.output_queue.put((result_tensor, frame_ids))
                logging.info(f"CVCUDA processor: Processed batch with frame IDs: {frame_ids}")

            except queue.Empty:
                continue
            except Exception as e:
                logging.error(f"Error in CVCUDA processing: {e}")
                import traceback
                traceback.print_exc()

    except Exception as e:
        logging.error(f"CVCUDA processor initialization error: {e}")
        import traceback
        traceback.print_exc()
    finally:
        logging.info("CVCUDA processing thread exiting")
        self.output_queue.put(None)

This is the logging output with error. What is the correct input structure for the elements argument??

2025-03-09 03:38:28,016 [Thread-10 (_process_with_cvcuda)] Successfully created cvcuda.Elements object.
2025-03-09 03:38:28,016 [Thread-10 (_process_with_cvcuda)] Extracted Capsule Type: <class 'cvcuda.Elements'>
2025-03-09 03:38:28,016 [Thread-10 (_process_with_cvcuda)] Error in CVCUDA processing: osd_into(): incompatible function arguments. The following argument types are supported:
1. (dst: nvcv.Tensor, src: nvcv.Tensor, elements: capsule, *, stream: Optional[nvcv.cuda.Stream] = None) -> nvcv.Tensor

Invoked with: kwargs: dst=<nvcv.Tensor shape=(20, 3, 1088, 1920) dtype=uint8>, src=<nvcv.Tensor shape=(20, 3, 1088, 1920) dtype=uint8>, elements=<cvcuda.Elements object at 0x7f524ca99db0>, stream=139993385369472
Traceback (most recent call last):
File "/workspace/Core_Module_v2/Picasso.py", line 289, in _process_with_cvcuda
cvcuda.osd_into(dst=frames_nvcv, src=frames_nvcv, elements=capsule_elements, stream=cuda_stream.handle)
TypeError: osd_into(): incompatible function arguments. The following argument types are supported:
1. (dst: nvcv.Tensor, src: nvcv.Tensor, elements: capsule, *, stream: Optional[nvcv.cuda.Stream] = None) -> nvcv.Tensor

Invoked with: kwargs: dst=<nvcv.Tensor shape=(20, 3, 1088, 1920) dtype=uint8>, src=<nvcv.Tensor shape=(20, 3, 1088, 1920) dtype=uint8>, elements=<cvcuda.Elements object at 0x7f524ca99db0>, stream=139993385369472
2025-03-09 03:38:28,031 [Thread-10 (_process_with_cvcuda)] Processing Batch - Frame Count: 20
2025-03-09 03:38:28,031 [Thread-10 (_process_with_cvcuda)] - Shape: torch.Size([20, 3, 1088, 1920])
2025-03-09 03:38:28,031 [Thread-10 (_process_with_cvcuda)] - Dtype: torch.uint8
2025-03-09 03:38:28,031 [Thread-10 (_process_with_cvcuda)] - Device: cuda:0
2025-03-09 03:38:28,031 [Thread-10 (_process_with_cvcuda)] - Memory Pointer: 52506394624
2025-03-09 03:38:28,031 [Thread-10 (_process_with_cvcuda)] Converted frames_nvcv Tensor: <nvcv.Tensor shape=(20, 3, 1088, 1920) dtype=uint8>
2025-03-09 03:38:28,031 [Thread-10 (_process_with_cvcuda)] - Shape: (20, 3, 1088, 1920)
2025-03-09 03:38:28,031 [Thread-10 (_process_with_cvcuda)] - Dtype: uint8
2025-03-09 03:38:28,031 [Thread-10 (_process_with_cvcuda)] Batch Elements Length: 20
2025-03-09 03:38:28,031 [Thread-10 (_process_with_cvcuda)] Frame 243: Contains 2 lines.
2025-03-09 03:38:28,031 [Thread-10 (_process_with_cvcuda)] Frame 245: Contains 2 lines.
2025-03-09 03:38:28,031 [Thread-10 (_process_with_cvcuda)] Frame 247: Contains 2 lines.
2025-03-09 03:38:28,032 [Thread-10 (_process_with_cvcuda)] Attributes of elements: ['class', 'delattr', 'dir', 'doc', 'eq', 'format', 'ge', 'getattribute', 'gt', 'hash', 'init', 'init_subclass', 'le', 'lt', 'module', 'ne', 'new', 'reduce', 'reduce_ex', 'repr', 'setattr', 'sizeof', 'str', 'subclasshook', 'pybind11_conduit_v1']

@peterrrrob97 peterrrrob97 added the question Further information is requested label Mar 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

1 participant