-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AMCL change map / first_map_only = false #2931
Comments
Does this happen deterministically or randomly? Does this behavior continue with either Galactic or Rolling? For context: If its the case that its been fixed, then we can find where it was fixed and see if we can't backport to Foxy. If its not been fixed, then it lets us know we can test this on more recent distributions to help you debug this issue and fix it across the board. @padhupradheep if you're interested in a project to work on, this would be helpful! |
Thank you Sir for the support. I just installed Galactic and the corresponding branch of amcl. At first sight, the problem seems to be not present anymore. What is interesting is that the rest of the system and all the other nodes are still running in Foxy, this makes me think that the problem is specifically in the amcl package. I can see that lot has changed between the two versions of amcl, starting from dependencies. |
Sure, I would take a look at it. I suppose, there are still a lot of companies using Foxy. We (Neobotix) were planning to completely stop supporting Foxy, because we(Nav2 community) are at least like 100's of kilometers forward for the feature upgrades that were / is being made in Nav2.
@mrmara Thanks for the update, I'll also give it a check... |
After more testing I was able to reproduce the same issue also in galactic. It is not clear to me why this happens less frequently, roughly after changing map 6 or 7 times. Still getting segmentation fault in the method map_calc_range (). |
On initial thoughts, it sounds like a race condition.. let me get back.. |
If this can be useful here there is another backtrace taken on galactic:
|
Maybe something isn't being properly locked or reset on map changes? First thing I'd do is look how the other threaded-functions are protecting resources and seeing if the map change callback is doing that. The next thing I'd do is check the ROS 1 version and see if there's some resetting of variables not happening on map changes here that was happening there that was lost in porting (under the assumption ROS 1 doesn't have the same issue). If those 2 low-hanging fruit directions don't pan out, then you need to start actually looking at the code of the function crashing / what the change map callback does and try to find the link of where things are going wrong. Some logging and compiling with You can safely ignore everything before |
Hi @hardesh, thanks for your contribution.
Thanks @SteveMacenski for the hints, I am going through them. |
I have been testing amcl map change feature with ROS Noetic and the bug is not present there. There are some differences in the way memory related to map is managed since particle filter and odometry is not resetted, maybe due to this issue.
Let me attach some logs: amcl backtraceox: 0.248028, Thread 13 "amcl" received signal SIGSEGV, Segmentation fault. map_server log[INFO] [1651581252.913153218] [map_server]: Handling LoadMap request[INFO] [map_io]: Loading yaml file: /home/mrmara/alba_dev_ws/install/ros_pkg/share/ros_pkg/maps/map1.yaml [DEBUG] [map_io]: resolution: 0.1 [DEBUG] [map_io]: origin[0]: 7.3 [DEBUG] [map_io]: origin[1]: 0 [DEBUG] [map_io]: origin[2]: 0 [DEBUG] [map_io]: free_thresh: 0.1 [DEBUG] [map_io]: occupied_thresh: 0.65 [DEBUG] [map_io]: mode: trinary [DEBUG] [map_io]: negate: 0 [INFO] [map_io]: Loading image_file: /home/mrmara/alba_dev_ws/install/ros_pkg/share/ros_pkg/maps/map1.png [DEBUG] [map_io]: Read map /home/mrmara/alba_dev_ws/install/ros_pkg/share/ros_pkg/maps/map1.png: 1110 X 1049 map @ 0.1 m/cell [INFO] [1651581257.312541929] [map_server]: Handling LoadMap request [INFO] [map_io]: Loading yaml file: /home/mrmara/alba_dev_ws/install/ros_pkg/share/ros_pkg/maps/map2.yaml [DEBUG] [map_io]: resolution: 0.1 [DEBUG] [map_io]: origin[0]: 42.1 [DEBUG] [map_io]: origin[1]: 0 [DEBUG] [map_io]: origin[2]: 0 [DEBUG] [map_io]: free_thresh: 0.1 [DEBUG] [map_io]: occupied_thresh: 0.35 [DEBUG] [map_io]: mode: trinary [DEBUG] [map_io]: negate: 0 [INFO] [map_io]: Loading image_file: /home/mrmara/alba_dev_ws/install/ros_pkg/share/ros_pkg/maps/map2.png [DEBUG] [map_io]: Read map /home/mrmara/alba_dev_ws/install/ros_pkg/share/ros_pkg/maps/map2.png: 762 X 599 map @ 0.1 m/cell |
Maybe, as @padhupradheep pointed out, this can be a race problem. When a new map message arrives the following actions are performed:
But if a new laserscan arrives in the meantime the new map message is being managed the following lines of code inside map_range.c provoke a segmentation fault:
This can be a possible reason. In order to confirm it, I am now trying to block new laserscans processing while processing new map message is completed. |
By adding the following line of code:
at the beginning of the callback |
Seems rational, I did find actually that MANY places where the Adding that lock is the right answer, it should be fine. But actually there should be many places that mutex is added, so if you look at the ROS 1 version, add it to the start of all of the functions there that had it. I'm not sure why people removed them here |
I think this issue is not present on the |
Thank you for your feedback @hardesh! I compared the main and galactic branches and there are a few modifications on amcl but none of them is related to the segmentation fault in map change event. The only related modification is the following line of code:
Which by the way is fixing another problem (not being able to overwrite first_map_only_ parameter). |
After comparing ROS Noetic and main branch of nav2_amcl I added the lock on
|
Thanks for the amazing work @mrmara ! Could you hand in a PR with all the changes ? |
@padhupradheep sure, the code is ready, may I ask you some hints on how to proceed? Is there a procedure to push the branch? |
Make all the changes from the main branch in your fork and then you need to do a Pull Request, there you just need to fill out some basic information, once you are done we wait for the CI to run the release build and test. Once that is done, then someone will review it (mostly @SteveMacenski). Make sure you check for the linters, before pushing. It is easy, just do We also have this nice documentation: https://navigation.ros.org/contribute/index.html |
Thanks |
Thanks everyone for jumping in and helping on this! I really appreciate the support / help! |
Bug report
Required Info:
Steps to reproduce issue
set first_map_only to false
load a newmap with map server
Ensure that amcl accepted and loaded the new map (log to appear: Received a %d X %d map @ %.3f m/pix) and that it is receiving laserscan data
Expected behavior
AMCL should load the new map and start localizing in it
Actual behavior
AMCL crashes
Additional information
Backtrace:
The text was updated successfully, but these errors were encountered: