Dispatcher deadlock workaround #10517

v-lopez · 2022-05-16T08:43:35Z

Details on #10482

Stopping _device_watcher before unregistering callbacks.

maloel · 2023-01-02T20:31:13Z

src/device.cpp

@@ -195,6 +195,7 @@ device::device(std::shared_ptr<context> ctx,

 device::~device()
 {
+    _context->stop_device_watcher();


Seems like it'd work, sure -- but why stop the context picking up new devices?

@v-lopez Can you suggest a better solution to this issue?

Haven't looked into the issue further. But I've been using this since then and I haven't appreciated any side effects.

@v-lopez
First, thanks for the PR, we appreciate it.
However, after reading the origin thread I am a little confused, as it seems to mix ROS (initial_reset) and plain libRS where the fix was made.
Can you provide a way for this deadlock to be reproduced with librealsense, either in C++ or Python? Once I understand how to reproduce I can add it to our unit-tests and see if I can solve it. Otherwise we'll likely direct it to our ROS guy to try and reproduce and see if the problem is there.

Milnaye · 2023-06-08T08:41:31Z

Hi,
I have the same issue using two cameras, ROS and the initial_reset, I can reproduce it consistently by launching the ROS node before plugging the camera. This fix resolve the issue but prevent recovery of unplugged cameras... Adding a delay in the ROS wrapper just after the hardware reset seems to resolve the issue, but it's not a real fix.
Does someone has a better solution ?

Arun-Prasad-V · 2023-06-13T11:08:44Z

Hi @Milnaye , could you please give more details about your test scenario?

I have the same issue using two cameras, ROS and the initial_reset, I can reproduce it consistently by launching the ROS node before plugging the camera. This fix resolve the issue but prevent recovery of unplugged cameras...

By this I assume, you have one camera always connected and the another camera is connected after calling the launch command. Please correct me if I'm wrong.

For which camera are you trying to do 'initial_reset'? The one which is already connected or the one you are connecting after calling launch command or for both?

Milnaye · 2023-06-13T13:14:07Z

Hi @Milnaye , could you please give more details about your test scenario?

I have the same issue using two cameras, ROS and the initial_reset, I can reproduce it consistently by launching the ROS node before plugging the camera. This fix resolve the issue but prevent recovery of unplugged cameras...

By this I assume, you have one camera always connected and the another camera is connected after calling the launch command. Please correct me if I'm wrong.

For which camera are you trying to do 'initial_reset'? The one which is already connected or the one you are connecting after calling launch command or for both?

Hi @Arun-Prasad-V,

After reading again what i wrote, i can understand that I wasn't clear at all...

On our robot, the two cameras stay connected all the time, the issue happens only on the second camera checked by the ros node.
To explain more, on the robot we use two ros nodes, we force the serial number of the two cameras so we know where they are, the ros nodes loop between the cameras and always start checking the "camera A" then "camera B". The issue happens only with the node "B", after resetting "camera B". We do an initial reset on both cameras.

I'm able to consistently reproduce the issue on a computer with only one camera, by lauching the ros node before plugging the camera. It happen only with initial_reset to true.

I hope it makes more sense now :)

Arun-Prasad-V · 2023-06-13T16:05:19Z

Hi @Milnaye,

Thanks for the details. If I understand correctly, you have explained two scenarios where the issue is seen:

Two cameras always connected. Enable initial_reset on both cameras. Issue happens with second camera
Single camera dynamically connected after launching the ros node. Issue happens with initial_reset
- Is the second camera still connected here?

Could you please share

the commands used? I assume you are launching both cameras with single command.
the results? maybe a log or screenshot for both scenarios

Also, please share the below details:

Required Info
Camera Model	{ R200 / F200 / SR300 / ZR300 / D400 }
Firmware Version	(Open RealSense Viewer --> Click info)
Operating System & Version	{Win (8.1/10) / Linux (Ubuntu 14/16/17) / MacOS
Kernel Version (Linux Only)	(e.g. 4.14.13)
Platform	PC/Raspberry Pi/ NVIDIA Jetson / etc..
SDK Version	{ legacy / 2.. }
ROS platform	Humble/Foxy/etc..
Language	{C/C#/labview/nodejs/opencv/pcl/python/unity }
Segment	{Robot/Smartphone/VR/AR/others }

Milnaye · 2023-06-15T15:04:45Z

Hi @Arun-Prasad-V,

Your description of both scenario are corrects. For the scenario 2, the second camera is not connected.

Required Info	Scenario 1
Camera Model	Two D455
Firmware Version	5.14.0.0
Operating System & Version	custom Linux
Kernel Version (Linux Only)	4.9.253
Platform	NVIDIA Jetson nano
SDK Version	version 2.53.1
ROS platform	Humble
Language	C and python
Segment	Robot

On scenario 1, we use a python launch file which load the cameras settings and some node parameters. Both camera node are defined separately in this launch file but except the serial number and the usb port, the way we launch them is similar.

camera_advanced_settings.json.txt
camera_node.log.txt
node_param.yaml.txt

This is a error log quite old, but it's still the same. node-26 correspond to "camera_left" and node-27 to "camera_right".
You can see that, after the "Resseting device..." line, the node 27 is stuck.

Required Info	Scenario 2
Camera Model	One D455
Firmware Version	5.14.0.0
Operating System & Version	Ubuntu 22.04
Kernel Version (Linux Only)	5.19.0
Platform	PC
SDK Version	happen with both 2.53.1 and 2.54.1
ROS platform	Humble
Language	C and python
Segment	Robot

On scenario 2, I use the command
ros2 run realsense2_camera realsense2_camera_node --ros-args -p initial_reset:=true
with a fresh build of librealsense sdk and realsense-ros.

The result log file is similar

Arun-Prasad-V · 2023-06-20T15:04:15Z

@Milnaye , Thanks for the details. I'm able to reproduce scenario_1 with and without ROS2 wrapper, which also matches with @v-lopez 's observation. Will work on the solution and update.

Arun-Prasad-V · 2023-11-03T11:40:32Z

@Milnaye , @v-lopez , This issue has been fixed in the 'development' branch. Please refer #10482 (comment).

Please try and let us know if you need any further help.

Workaround for IntelRealSense#10482

a9af3ea

Nir-Az requested a review from maloel September 8, 2022 18:49

Nir-Az requested review from maloel and removed request for maloel January 2, 2023 18:52

maloel reviewed Jan 2, 2023

View reviewed changes

Milnaye mentioned this pull request Aug 10, 2023

Avoid device_watcher deadlock after reset #11933

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dispatcher deadlock workaround #10517

Dispatcher deadlock workaround #10517

v-lopez commented May 16, 2022

maloel Jan 2, 2023

Nir-Az Feb 19, 2023

v-lopez Feb 19, 2023

maloel Feb 20, 2023

Milnaye commented Jun 8, 2023

Arun-Prasad-V commented Jun 13, 2023

Milnaye commented Jun 13, 2023

Arun-Prasad-V commented Jun 13, 2023

Milnaye commented Jun 15, 2023

Arun-Prasad-V commented Jun 20, 2023

Arun-Prasad-V commented Nov 3, 2023

Dispatcher deadlock workaround #10517

Are you sure you want to change the base?

Dispatcher deadlock workaround #10517

Conversation

v-lopez commented May 16, 2022

maloel Jan 2, 2023

Choose a reason for hiding this comment

Nir-Az Feb 19, 2023

Choose a reason for hiding this comment

v-lopez Feb 19, 2023

Choose a reason for hiding this comment

maloel Feb 20, 2023

Choose a reason for hiding this comment

Milnaye commented Jun 8, 2023

Arun-Prasad-V commented Jun 13, 2023

Milnaye commented Jun 13, 2023

Arun-Prasad-V commented Jun 13, 2023

Milnaye commented Jun 15, 2023

Arun-Prasad-V commented Jun 20, 2023

Arun-Prasad-V commented Nov 3, 2023