Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tb3_simulation remains idle with no movement after long hours of running #1955

Closed
rightbot-abhinav opened this issue Aug 18, 2020 · 25 comments
Closed

Comments

@rightbot-abhinav
Copy link

rightbot-abhinav commented Aug 18, 2020

**Bug report

Required Info:**

**Operating System:

Ubuntu 18.04
ROS2 Version:

Foxy source
Version or commit hash:

DDS implementation:

PLATFORM INFORMATION
system : Linux
platform info : Linux-4.9.140-tegra-aarch64-with-Ubuntu-18.04-bionic
release : 4.9.140-tegra
processor : aarch64
RMW MIDDLEWARE
middleware name : rmw_cyclonedds_cpp

ROS 2 INFORMATION
distribution name : foxy
distribution type : ros2
distribution status : active
release platforms : {'ubuntu': ['focal']}

ROS 2 INFORMATION
distribution name : foxy
distribution type : ros2
distribution status : active
release platforms : {'ubuntu': ['focal']}

gazebo version 11 installed from binary

Steps to reproduce issue
ros2 launch nav2_bringup tb3_simulation_launch.py
leave the simulation idle for some time around 30 mins
try to give a navigation waypoint

Expected behavior
nav2bringup should run continuously after a certain idle time without any issues

Actual behavior
ros2 launch nav2_bringup tb3_simulation_launch.py
Since planner server process exits the goalsset are not achieved and it remains idle

Additional information**
[planner_server-9] [ERROR] [1597850629.255140273] [rcl]: Failed to get trigger guard condition in jump callback
[planner_server-9] [INFO] [1597850631.461968287] [global_costmap.global_costmap_rclcpp_node]: Message Filter dropping message: frame 'base_scan' at time 255.201 for reason 'Unknown'
[ERROR] [planner_server-9]: process has died [pid 14670, exit code -7, cmd '/home/warry/navigation2_ws/install/nav2_planner/lib/nav2_planner/planner_server --ros-args -r __node:=planner_server --params-file /tmp/tmpzu66z_32 -r /tf:=tf -r /tf_static:=tf_static'].

@rightbot-abhinav
Copy link
Author

rightbot-abhinav commented Aug 18, 2020

Images attached here of the current scenario
image

@SteveMacenski

@rightbot-abhinav rightbot-abhinav changed the title controller_server is crashing planner_server is crashing Aug 18, 2020
@rightbot-abhinav
Copy link
Author

Tried generating backtrace the prefix is not giving any gdb session prompt after a crash. Trying to use @facontidavide 's suggestion of using backward-cpp @SteveMacenski

@rightbot-abhinav
Copy link
Author

WARNING: Be aware that are nodes in the graph that share an exact name, this can have unintended side effects.
/amcl
/amcl_rclcpp_node
/bt_navigator
/bt_navigator_rclcpp_node
/controller_server
/controller_server_rclcpp_node
/gazebo
/global_costmap/global_costmap
/global_costmap/global_costmap_rclcpp_node
/global_costmap_client
/intel_realsense_r200_depth_driver
/lifecycle_manager_localization
/lifecycle_manager_localization_service_client
/lifecycle_manager_navigation
/lifecycle_manager_navigation_service_client
/local_costmap/local_costmap
/local_costmap/local_costmap_rclcpp_node
/local_costmap_client
/map_server
/planner_server
/planner_server_rclcpp_node
/recoveries_server
/recoveries_server_rclcpp_node
/robot_state_publisher
/rviz2
/rviz2
/rviz2
/rviz2
/transform_listener_impl_556a338680
/transform_listener_impl_556a338680
/transform_listener_impl_556a39bfb0
/transform_listener_impl_5594c4a1e0
/transform_listener_impl_7f6802bb60
/transform_listener_impl_7f6c02c280
/transform_listener_impl_7f94019210
/turtlebot3_diff_drive
/turtlebot3_imu
/turtlebot3_joint_state
/turtlebot3_laserscan
/waypoint_follower
/waypoint_follower

is this normal @SteveMacenski ?

@mikeferguson
Copy link
Contributor

WARNING: Be aware that are nodes in the graph that share an exact name, this can have unintended side effects.

This is fixed in 606ca7a - but hasn't been backported yet to Foxy

@rightbot-abhinav
Copy link
Author

rightbot-abhinav commented Aug 18, 2020

@mikeferguson any issues when ported to foxy? Ill check it up. When is the next update planned? Also the changes were done only for waypoint?

@mikeferguson
Copy link
Contributor

Should be no issues - I'm mainly developing off the foxy branch and did test it there.

@rightbot-abhinav
Copy link
Author

/rviz2
/rviz2
/rviz2
/rviz2
@mikeferguson did you face this as well ?

@mikeferguson
Copy link
Contributor

/rviz2
/rviz2
/rviz2
/rviz2

No - that seems like a different issue - does rviz2 not start up with a randomized name?

@rightbot-abhinav
Copy link
Author

Now it seems to weird. @mikeferguson

@SteveMacenski
Copy link
Member

SteveMacenski commented Aug 18, 2020

@rightbot-abhinav This is probably a local issue to you, especially given the other tickets you've been filing. I'd suggest slowing down and thinking things through more before reacting. Changing a ticket from Server A crashing to Server B crashing shows some issue you should think on more than just changing the name.

Please find a backtrace, I provide this great tutorial about it https://navigation.ros.org/tutorials/docs/get_backtrace.html. We can't do anything with a crash report that others can't reproduce. I'm going on a hunch that its the same issue you had in your last ticket with DDS issues around message filters. You're also running foxy source on 18.04, which is unsupported. Foxy targets 20.04 and the versions of critical libraries might not be compatible.

@rightbot-abhinav
Copy link
Author

@SteveMacenski thanks for the info. Sure will take this into consideration.

@JaimeMartin
Copy link

JaimeMartin commented Aug 19, 2020

@rightbot-abhinav ,

Could you please confirm if after checking everything (please make sure you are using the latest Foxy sources), and changing back to Fast DDS everything works?

@rightbot-abhinav
Copy link
Author

@JaimeMartin the earlier crashing issue was solved just by changing the middleware to cyclone dds (#1950) but the current issue persists as @SteveMacenski it might be something local to my system/build and not linked to anything else

@rightbot-abhinav
Copy link
Author

Very similar to #1889 Stable initially but leaving it idle for some time causes the below
[planner_server-9] [ERROR] [1597850629.255140273] [rcl]: Failed to get trigger guard condition in jump callback
[planner_server-9] [INFO] [1597850631.461968287] [global_costmap.global_costmap_rclcpp_node]: Message Filter dropping message: frame 'base_scan' at time 255.201 for reason 'Unknown'
[ERROR] [planner_server-9]: process has died [pid 14670, exit code -7, cmd '/home/warry/navigation2_ws/install/nav2_planner/lib/nav2_planner/planner_server --ros-args -r __node:=planner_server --params-file /tmp/tmpzu66z_32 -r /tf:=tf -r /tf_static:=tf_static'].
@Michael-Equi any more info (fixes/debug) etc ?

@Michael-Equi
Copy link
Contributor

I haven't really had time to dig too much more into it. I just added a restart button to my system as a temporary remedy but, it is still an issue for me.

@mikeferguson
Copy link
Contributor

[planner_server-9] [INFO] [1597850631.461968287] [global_costmap.global_costmap_rclcpp_node]: Message Filter dropping message: frame 'base_scan' at time 255.201 for reason 'Unknown'
[ERROR] [planner_server-9]: process has died [pid 14670, exit code -7, cmd '/home/warry/navigation2_ws/install/nav2_planner/lib/nav2_planner/planner_server --ros-args -r __node:=planner_server --params-file /tmp/tmpzu66z_32 -r /tf:=tf -r /tf_static:=tf_static'].

Do you know if these two messages are close together? Like, does the message filter drop and then you crash?

@rightbot-abhinav
Copy link
Author

@mikeferguson I get a stream of message filter drop and then this occurs like this
[planner_server-9] [INFO] [1597851477.965252004] [global_costmap.global_costmap_rclcpp_node]: Message Filter dropping message: frame 'base_scan' at time 1317.601 for reason 'Unknown'
[planner_server-9] [INFO] [1597851482.026140091] [global_costmap.global_costmap_rclcpp_node]: Message Filter dropping message: frame 'base_scan' at time 1321.406 for reason 'Unknown'
[controller_server-8] [INFO] [1597851486.118044253] [local_costmap.local_costmap_rclcpp_node]: Message Filter dropping message: frame 'base_scan' at time 1340.001 for reason 'Unknown'
[planner_server-9] [INFO] [1597851496.318764925] [global_costmap.global_costmap_rclcpp_node]: Message Filter dropping message: frame 'base_scan' at time 1322.801 for reason 'Unknown'
[ERROR] [planner_server-9]: process has died [pid 14670, exit code -7, cmd '/home/warry/navigation2_ws/install/nav2_planner/lib/nav2_planner/planner_server --ros-args -r __node:=planner_server --params-file /tmp/tmpzu66z_32 -r /tf:=tf -r /tf_static:=tf_static'].

@SteveMacenski
Copy link
Member

@mikeferguson I think the key line is

[planner_server-9] [ERROR] [1597850629.255140273] [rcl]: Failed to get trigger guard condition in jump callback

Anytime I see RCL errors that's not a good situation. The next line:

Message Filter dropping message: frame 'base_scan' at time 255.201 for reason 'Unknown'

I think is the generic exception handling from the laser scan coming into the costmap in the planner server by message filters. My guess is that is actually catching the exception thrown by rcl or something. Then crash. We definitely need a traceback @rightbot-abhinav

@rightbot-abhinav
Copy link
Author

@SteveMacenski I will try to post the traceback. In the meanwhile I made a clone of the system on a supported OS (Focal). The same issue is occurring. Let me know if you want to open the issue

@SteveMacenski
Copy link
Member

SteveMacenski commented Aug 20, 2020

I'm not sure the best plan here - this isn't something that anyone else can reproduce so there's not a good next step unless we have a backtrace to work from

@rightbot-abhinav
Copy link
Author

@SteveMacenski valid point will add the backtrace. Would be great if you could enlighten on this

  1. I'm not getting a backtrace prompt I've also seen you have raised an issue for the same and it is in progress ?
  2. What's the hardware the releases are tested against ?

@SteveMacenski
Copy link
Member

Please see our tutorial which has instructions for this https://navigation.ros.org/tutorials/docs/get_backtrace.html

@rightbot-abhinav
Copy link
Author

This issue has been resolved. Thanks a ton @daisukes for the fix to the issue ros2/rclcpp#1266. the behavior observed was very similar. The stack has been running overnight without any error's :)
Would be great if this could be added to the the foxy branch as well as the official repos list invokes this branch rather than the master .
There is still a potential leak for which there is a fix released and is yet to be backported for foxy
Issue: ros2/geometry2#281
Fix: ros2/geometry2@fb931a4

A problem still persists for which
issue has been raised and a workaround has also been given by @daisukes. Thanks a ton :)
ros2/rviz#574
But @daisukes you mentioned the latest release (foxy) solved the memory leak causing rviz to crash. Did you test it against fast-rtps and cyclone dds or only with fast-rtps ? Without the workaround it still crashes randomly for me and is difficult to reproduce.

@Michael-Equi I have been observing that you have the same issues which I was facing would be great if you could integrate the above commits to your code base and test it out !
#1889

Thanks @SteveMacenski as you had mentioned rightly it had nothing to do with nav stack and I had removed the plugins Navfn and DWB and yet it occurred. Thanks for your help !

@rightbot-abhinav rightbot-abhinav changed the title planner_server is crashing tb3_simulation remains idle with no movement after long hours of running Aug 29, 2020
@SteveMacenski
Copy link
Member

Would be great if this could be added to the the foxy branch as well as the official repos list invokes this branch rather than the master .

What is "this"? That doesn't seem to relate to this project's branches.

@rightbot-abhinav
Copy link
Author

rightbot-abhinav commented Aug 31, 2020

What is "this"? That doesn't seem to relate to this project's branches.

Would be great if this could be added to the the foxy branch as well as the official repos list invokes this branch rather than the master .

Yeah this is not related to this project's branch. I was wrong to mention it here. It's a fix for the rclcpp repo master branch and has not been ported to the foxy branch. I incorporated the changes and it solved the issue for me. Thanks !

Link: ros2/rclcpp#1266

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants