Replies: 4 comments 10 replies
-
👋 Hello @egleorl1, thank you for sharing your interesting work with YOLOv8 🚀! Your project involving keypoint tracking of a robotic arm and human pose estimation sounds fascinating and impactful. While we can't provide a step-by-step solution here, I'd be happy to point you in the right direction and summarize the tools and resources available to you. If this is a ❓ Custom Workflow Question, please provide additional details if possible, such as datasets, logs, or any relevant scripts you’ve already tried. This ensures we can better understand your setup. For code-heavy topics, sharing a minimum reproducible example (MRE) could help us assist you more effectively. If you're trying to implement 3D pose estimation using multi-view geometry and YOLO:
For immediate experimentation, ensure your pip install -U ultralytics For real-time community conversations and feedback, I recommend joining our Discord 🎧. Additionally, you can explore our Subreddit or Discourse Forum for longer-form discussions. Sharing your setup and asking for integration tips there might bring insights from others tackling similar problems. You’re also encouraged to test YOLO in verified cloud environments. These environments are preconfigured with dependencies to help accelerate experimentation: Our CI tests also validate YOLO's 🟢 stability across various operating systems (Ultralytics CI) to ensure seamless usage for your development. Lastly, this is an automated response 🤖, but rest assured that one of our engineers will review your discussion soon to provide further insights. Thank you for being part of the Ultralytics community! |
Beta Was this translation helpful? Give feedback.
-
@egleorl1 thank you for your question regarding integrating YOLO pose estimation with multi-view geometry for 3D keypoint reconstruction. While YOLO models provide accurate 2D keypoint detection, transitioning to 3D requires triangulation using synchronized multi-camera setups and corresponding camera calibration data (intrinsics and extrinsics). You'll need to:
For implementation, refer to OpenCV's documentation on camera calibration and triangulation. While Ultralytics does not directly support 3D reconstruction, combining YOLO's 2D outputs with such tools is a well-established approach. Let me know if you need further clarification. |
Beta Was this translation helpful? Give feedback.
-
@glenn-jocher thank you very much much for your answer! |
Beta Was this translation helpful? Give feedback.
-
Sounds great! 😊 Could you please clarify why the ROS system is necessary for integrating YOLO with depth images? I’m quite new to computer vision and still having trouble visualizing the entire process. The final output I’m hoping for is to use my custom YOLO pose model on video captured by the RealSense, which, as you mentioned, synchronizes RGB and depth information. Then, I would like to save the coordinates extracted by YOLO (e.g., keypoints_normalized = r.keypoints.xyn.tolist()) and integrate them with the depth map to obtain z values for each keypoint. Is there an example of code I can customize for this purpose? |
Beta Was this translation helpful? Give feedback.
-
Hi! I trained a model for keypoint tracking of a robotic arm using YOLOv8 for pose estimation. I combined my custom-trained model with another model for human pose estimation (since in my setup, a human collaborates with a robotic arm during an assembly process).
I would like to extract the time series data related to the movements of both the robot and the human to evaluate their coordination. Nevertheless, using YOLO, I can only extract 2D information (xy coordinates of each keypoint). From the related issues I’ve read, I saw that a good solution to add depth (z) information to each keypoint, transitioning from 2D to 3D, is through multi-view geometry and triangulation.
I do have indeed access to two cameras and would like to ask for your help to understand how to best integrate YOLO pose estimation with multi-view geometry to obtain xyz information for each keypoint. A workflow explanation and code examples would be greatly appreciated :)
Beta Was this translation helpful? Give feedback.
All reactions