Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix segfault at reconfigure of AdmittanceController #1248

Merged
merged 4 commits into from
Aug 22, 2024

Conversation

firesurfer
Copy link
Contributor

@firesurfer firesurfer commented Aug 15, 2024

This PR fixes #1181.

I tested it on iron and rebased it on rolling afterwards.

This PR simply avoid recreating the kinematics plugin, but just reinitializes it - which seems to work.

Copy link
Member

@saikishor saikishor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes look ok, but I'm not sure if this is the right way to fix the seg fault issue you are having

@firesurfer
Copy link
Contributor Author

What would you suggest as an alternative ?

Copy link

codecov bot commented Aug 15, 2024

Codecov Report

Attention: Patch coverage is 0% with 2 lines in your changes missing coverage. Please review.

Project coverage is 80.41%. Comparing base (4ab22a5) to head (e915412).
Report is 1 commits behind head on master.

Files Patch % Lines
...ude/admittance_controller/admittance_rule_impl.hpp 0.00% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1248      +/-   ##
==========================================
- Coverage   80.43%   80.41%   -0.02%     
==========================================
  Files         105      105              
  Lines        9353     9355       +2     
  Branches      818      819       +1     
==========================================
  Hits         7523     7523              
- Misses       1556     1557       +1     
- Partials      274      275       +1     
Flag Coverage Δ
unittests 80.41% <0.00%> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
...ude/admittance_controller/admittance_rule_impl.hpp 92.66% <0.00%> (-0.86%) ⬇️

... and 2 files with indirect coverage changes

@saikishor
Copy link
Member

What would you suggest as an alternative ?

We need to introspect more, because your fix is just masking the real issue. What do you say?

@firesurfer
Copy link
Contributor Author

My guess to what happens there is:

We basically delete and recreate the instance of the kinematics plugin. It might be possible that at the same time there is an access happening to the reference we have to the dynamic library - but it is already deleted. Given that it does not happen every time is a strong indicator that probably multiple threads are accessing the instance of the kinematics plugin at the same time.

I think not reloading the plugin if it is already loaded is just fine.

@saikishor
Copy link
Member

My guess to what happens there is:

We basically delete and recreate the instance of the kinematics plugin. It might be possible that at the same time there is an access happening to the reference we have to the dynamic library - but it is already deleted. Given that it does not happen every time is a strong indicator that probably multiple threads are accessing the instance of the kinematics plugin at the same time.

I think not reloading the plugin if it is already loaded is just fine.

If the controller is not active, the update loop is not run at all, in that case there is only one thread that is calling the configure method is the non-RT one. That's why, I'm very curious to know why only on reconfigure. Can you try resetting the kinematics_ pointer with .reset() before resetting the loader.

@firesurfer
Copy link
Contributor Author

firesurfer commented Aug 19, 2024

I guess you were right!

I just changed to code to:

      if(kinematics_loader_ ){
        kinematics_.reset();
      }
        kinematics_loader_ =
          std::make_shared<pluginlib::ClassLoader<kinematics_interface::KinematicsInterface>>(
            parameters_.kinematics.plugin_package, "kinematics_interface::KinematicsInterface");
        kinematics_ = std::unique_ptr<kinematics_interface::KinematicsInterface>(
          kinematics_loader_->createUnmanagedInstance(parameters_.kinematics.plugin_name));
      
      if (!kinematics_->initialize(
            node->get_node_parameters_interface(), parameters_.kinematics.tip))
      {
        return controller_interface::return_type::ERROR;
      }

and no segfault. But it is nevertheless very interesting that this is causing an issue.

Edit: I think we can still keep the original fix I made as there is no real reason to actually reload the plugin rather than just reinitializing it.

@firesurfer
Copy link
Contributor Author

@saikishor So what do you think. Shall we keep the fix as it currently or do you suggest a different approach ?

@saikishor
Copy link
Member

@saikishor So what do you think. Shall we keep the fix as it currently or do you suggest a different approach ?

@firesurfer have you taken a look at the pluginlib documentation?. I recommend doing so and do a proper fix.

As we are talking about reconfiguring, it could be that the user wants to change the plugin type and it would be only possible at reconfigure instance, else he has to unload and load + configure the controller again

@firesurfer
Copy link
Contributor Author

firesurfer commented Aug 21, 2024

@saikishor I just pushed the adaption I showed in my previous comment.

I think this could be a proper solution as it makes sure that the KinematicsInterface is destroyed before we destroy and recreate the ClassLoader. Additionally as we also recreate the loader it is also possible to change the plugin type now.

I did take a look at the ClassLoader headerfile but as far as I can tell there is no information on how to properly re-create a ClassLoader or about the destruction order

But what I could imagine what happens there is:

When we destroy the class loader the linux dynamic library loader may unload/destroy the instance of the dynamic library https://linux.die.net/man/3/dlopen:
The function dlclose() decrements the reference count on the dynamic library handle handle. If the reference count drops to zero and no other loaded libraries use symbols in it, then the dynamic library is unloaded.
The kernel btw. does not have to unload the library. link to stackoverflow

If we then destroy the interface handle it calls the destructor of the KinematicsInterface, which results in a segfault in case the library is already unloaded. I would guess that the timing between the calling dlclose and the library actually being unloaded may not be deterministic.

But as I said this is just a guess.

Copy link
Member

@bmagyar bmagyar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is an elegant solution to the problem you encountered

@firesurfer
Copy link
Contributor Author

The failing ci issues are as far as I can tell due to an unrelated test.

Copy link
Contributor

@olivier-stasse olivier-stasse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (after the dlopen discussions).

@firesurfer
Copy link
Contributor Author

@saikishor Is it fine for you to merge it like this / does my explanation sound reasonable ?

@bmagyar bmagyar enabled auto-merge (squash) August 22, 2024 12:13
@bmagyar bmagyar disabled auto-merge August 22, 2024 14:02
@bmagyar bmagyar merged commit 31f7fbe into ros-controls:master Aug 22, 2024
19 checks passed
@bmagyar bmagyar added the backport-humble This label should be used by maintaines only! Label triggers PR backport to ROS2 humble. label Aug 22, 2024
@bmagyar bmagyar added the backport-iron This label should be used by maintaines only! Label triggers PR backport to ROS2 Iron. label Aug 22, 2024
mergify bot pushed a commit that referenced this pull request Aug 22, 2024
mergify bot pushed a commit that referenced this pull request Aug 22, 2024
RobertWilbrandt pushed a commit to RobertWilbrandt/ros2_controllers that referenced this pull request Sep 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-humble This label should be used by maintaines only! Label triggers PR backport to ROS2 humble. backport-iron This label should be used by maintaines only! Label triggers PR backport to ROS2 Iron.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Iron] - Admittance controller regulary crashes with segfault on activation / configuration
4 participants