Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added additional flag to fix race condition in TransformListener #518

Open
wants to merge 3 commits into
base: galactic
Choose a base branch
from

Conversation

zygfrydw
Copy link

Fixed race condition in TransformListener which happens when TransformListener destructor is called before dedicated thread reaches executor->spin()

The issue is reported here

The alternative solution is to add thread-safe executor:

class SingleTreadedThreadSafeExecutor : public rclcpp::executors::SingleThreadedExecutor{
public:
  SingleTreadedThreadSafeExecutor() : is_terminated_(false)
  {

  }

  void spin_thread_safe(){
    if (spinning.exchange(true)) {
      throw std::runtime_error("spin() called while already spinning");
    }
    RCLCPP_SCOPE_EXIT(this->spinning.store(false); );
    while (rclcpp::ok(this->context_) && !is_terminated_ && spinning.load()) {
      rclcpp::AnyExecutable any_executable;
      if (get_next_executable(any_executable)) {
        execute_any_executable(any_executable);
      }
    }
  }
  void shutdown(){
    is_terminated_ = true;
    this->cancel();
  }

protected:
  std::atomic_bool is_terminated_;
};


void TransformListener::initThread(
  rclcpp::node_interfaces::NodeBaseInterface::SharedPtr node_base_interface)
{
  auto executor = std::make_shared<SingleTreadedThreadSafeExecutor>();

  // This lambda is required because `std::thread` cannot infer the correct
  // rclcpp::spin, since there are more than one versions of it (overloaded).
  // see: http://stackoverflow.com/a/27389714/671658
  // I (wjwwood) chose to use the lamda rather than the static cast solution.
  auto run_func =
    [executor](rclcpp::node_interfaces::NodeBaseInterface::SharedPtr node_base_interface) {
      executor->add_node(node_base_interface);
      executor->spin();
      executor->remove_node(node_base_interface);
    };
  dedicated_listener_thread_ = thread_ptr(
    new std::thread(run_func, node_base_interface),
    [executor](std::thread * t) {
      executor->shutdown();
      t->join();
      delete t;
      // TODO(tfoote) reenable callback queue processing
      // tf_message_callback_queue_.callAvailable(ros::WallDuration(0.01));
    });
  // Tell the buffer we have a dedicated thread to enable timeouts
  buffer_.setUsingDedicatedThread(true);
}

@zygfrydw zygfrydw force-pushed the fix-race-in-transfrom-listener-destructor branch from 72e5d8b to 933609a Compare April 14, 2022 15:05
@zygfrydw zygfrydw force-pushed the fix-race-in-transfrom-listener-destructor branch from 39b3eaf to 5326e5a Compare April 15, 2022 07:58
@zygfrydw
Copy link
Author

If you like to include a unit test that checks the path on which this bug occurs, I have implemented an appropriate test here.
However, my test requires extending TransformListener constructor with a thread factory, allowing to inject test code to thread construction.

I have excluded the change from this PR because it may be too intrusive for a public API of TransformListener. If you find it useful, I will include the changes in this PR.

@clalancette clalancette self-requested a review July 28, 2022 13:15
@clalancette clalancette self-assigned this Jul 28, 2022
@CursedRock17
Copy link
Contributor

I found this PR when delving into both the respective issue and the essentially identical #73, but I belive this was fixed in #442 with an updated destructor:

  if (spin_thread_) {
    executor_->cancel();
    dedicated_listener_thread_->join();
  }

which seems to prevent steps 6 and 7

Dedicated thread at the beginning of executor->spin() sets executor.spinning = true
Dedicated thread is spinning waiting for messages or cancel which will not happen because main thread is blocked in t->join();

The current implementation currently works for me as there's no longer a spinning node, nor any "loose" threads.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants