Skip to content
This repository has been archived by the owner on Oct 29, 2018. It is now read-only.

Gazebo Visibility ROS Plugin Segfault #10

Open
fairlight1337 opened this issue Jul 21, 2016 · 14 comments
Open

Gazebo Visibility ROS Plugin Segfault #10

fairlight1337 opened this issue Jul 21, 2016 · 14 comments
Assignees
Labels

Comments

@fairlight1337
Copy link
Contributor

fairlight1337 commented Jul 21, 2016

When calling the /gazebo_visibility_ros/QueryGazeboVisibility service, I get the following output on the console:

[ INFO] [1469102895.667786945, 548.339000000]: Got QueryGazeboVisibility service request.
[ INFO] [1469102895.667999165, 548.339000000]: Model name seems to make sense.
[ INFO] [1469102895.668393153, 548.339000000]: Got a grid of points to test.
[ INFO] [1469102895.678479563, 548.349000000]: Done with raytrace.
Segmentation fault (core dumped)
[gazebo-4] process has died [pid 9816, exit code 139, cmd /opt/ros/indigo/lib/gazebo_ros/gzserver -e ode /home/winkler/ros/new_cram/ros_top/src/longterm_fetch_and_place/ltfnp_gazebo/worlds/empty_world_with_plugins.world __name:=gazebo __log:=/home/winkler/.ros/log/4a369898-4f3a-11e6-b32e-50465d9e262d/gazebo-4.log].
log file: /home/winkler/.ros/log/4a369898-4f3a-11e6-b32e-50465d9e262d/gazebo-4*.log

@mpomarlan, any ideas? I'm querying with the values you suggested in the process module, only changing the model name (of which I know it exists).

I can reproduce the error.

@fairlight1337
Copy link
Contributor Author

I put some debug output in there around here like so:

[...]
std::cout << 1 << std::endl;
ROS_INFO("Done with raytrace.");
std::cout << 2 << std::endl;

response.visible = false;
std::cout << 3 << std::endl;
if(request.threshold <= (okPoints*1.0)/(maxK*1.0))
    response.visible = true;

std::cout << 4 << std::endl;
return true;
[...]

which in turn resulted in this output:

[ INFO] [1469104108.807908525, 139.898000000]: Got QueryGazeboVisibility service request.
[ INFO] [1469104108.808081213, 139.898000000]: Model name seems to make sense.
[ INFO] [1469104108.808162475, 139.898000000]: Got a grid of points to test.
1
[ INFO] [1469104108.811476384, 139.898000000]: Done with raytrace.
2
3
4

This time it happened after 6 tries:

[ INFO] [1469104109.883566168, 140.788000000]: Got QueryGazeboVisibility service request.
[ INFO] [1469104109.883736945, 140.788000000]: Model name seems to make sense.
[ INFO] [1469104109.883848783, 140.788000000]: Got a grid of points to test.
1
[ INFO] [1469104109.896919863, 140.799000000]: Done with raytrace.
2
3
4
Segmentation fault (core dumped)
[gazebo-4] process has died [pid 27529, exit code 139, cmd /opt/ros/indigo/lib/gazebo_ros/gzserver -e ode /home/winkler/ros/new_cram/ros_top/src/longterm_fetch_and_place/ltfnp_gazebo/worlds/empty_world_with_plugins.world __name:=gazebo __log:=/home/winkler/.ros/log/365e3728-4f3e-11e6-8358-50465d9e262d/gazebo-4.log].
log file: /home/winkler/.ros/log/365e3728-4f3e-11e6-8358-50465d9e262d/gazebo-4*.log

Could this be a non-initialized pointer, some faulty member variable, or some race condition? I'll look for instructions on how to run gzserver in gdb once I find the time, but the debug output didn't turn up any evidence yet.

fairlight1337 added a commit that referenced this issue Jul 21, 2016
…for perception

The call to the visibility checker plugin is integrated into the perception pipeline for Gazebo now, but is currently inactive (see #10).
@mpomarlan
Copy link
Contributor

I've got no idea, because when I try the plugin, I don't get a segfault.

For repeatability, the way I called the plugin to test it is:

(roslisp:call-service "/gazebo_visibility_ros/QueryGazeboVisibility" "gazebo_visibility_ros/QueryGazeboVisibility"
:name "pr2"
:camera_pose (roslisp:make-message "geometry_msgs/Point" :x 0 :y 0 :z 10)
:camera_fwd (roslisp:make-message "geometry_msgs/Point" :x 0 :y 0 :z -1)
:camera_up (roslisp:make-message "geometry_msgs/Point" :x 1 :y 0 :z 0)
:focal_distance 1
:width 1
:height 1
:max_distance 12
:threshold 0.1)

which gave the result visibile : 1 as expected.

@fairlight1337
Copy link
Contributor Author

Okay, thanks. I'll try it with that call tomorrow. If that works, I'll try the same from roslisp.

I'll keep you in the loop about the gdb thing.

On 21.07.2016, at 15:06, mpomarlan [email protected] wrote:

I've got no idea, because when I try the plugin, I don't get a segfault.

For repeatability, the way I called the plugin to test it is:

(roslisp:call-service "/gazebo_visibility_ros/QueryGazeboVisibility" "gazebo_visibility_ros/QueryGazeboVisibility"
:name "pr2"
:camera_pose (roslisp:make-message "geometry_msgs/Point" :x 0 :y 0 :z 10)
:camera_fwd (roslisp:make-message "geometry_msgs/Point" :x 0 :y 0 :z -1)
:camera_up (roslisp:make-message "geometry_msgs/Point" :x 1 :y 0 :z 0)
:focal_distance 1
:width 1
:height 1
:max_distance 12
:threshold 0.1)

which gave the result visibile : 1 as expected.


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub, or mute the thread.

@mpomarlan
Copy link
Contributor

I now tried the filter-models-by-field-of-view function, and on my system gazebo still runs.

@fairlight1337
Copy link
Contributor Author

I had that after a couple of tries. Have you tried with the PR2 as object to see, or something else?

On 21.07.2016, at 15:15, mpomarlan [email protected] wrote:

I now tried the filter-models-by-field-of-view function, and on my system gazebo still runs.


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub, or mute the thread.

@mpomarlan
Copy link
Contributor

The PR2 as object to see. I can't see it from the PR2 itself though, which seems plausible.

@fairlight1337
Copy link
Contributor Author

For reference, here's the other Gazebo plugin we talked about:

It was developed explicitly for Gazebo 2.2.3 (which is the version I'm running). So maybe there are initialization differences between that version and the one you are using.

@fairlight1337
Copy link
Contributor Author

I ran another couple tests, partially commenting out code from the plugin, recompiling, and seeing whether the segfault is still there.

Turns out this line is (or at least is related to) the culprit:

gazebo::physics::RayShapePtr ray = boost::dynamic_pointer_cast<gazebo::physics::RayShape>(engine->CreateShape("ray\
", gazebo::physics::CollisionPtr()));

I looked up both involved classes up in the Gazebo 2.2.3 Code Documentation:

One does not seem to be a sub-class of the other (specifically, in this case, I guess CollisionShape should be somewhat a parent class of RayShape). They both seem to have a common ancestor class, though (gazebo::physics::Base).

I haven't checked full compatibility of the two classes, but this dynamic cast, or some destructor-functionality called afterwars, seems to be problematic.

The actual code that now resides in the doQueryGazeboVisibility member function during my tests is:

bool doQueryGazeboVisibility(physics::WorldPtr world, sdf::ElementPtr sdf, gazebo_visibility_ros::QueryGazeboVisibility::Request &request, gazebo_visibility_ros::QueryGazeboVisibility::Response &response) {
    ROS_INFO("Got QueryGazeboVisibility service request.");
    physics::ModelPtr model = world->GetModel(request.name);
    if((!model.get()))
    {
        ROS_INFO("Model name not loaded.");
        response.visible = false;
        return true;
    }
    ROS_INFO("Model name seems to make sense, and points to a model of type %d with %d children.", model->GetType(), model->GetChildCount());

    math::Box bbox = getModelBox(model);

    gazebo::physics::PhysicsEnginePtr engine = world->GetPhysicsEngine();
    engine->InitForThread();

    gazebo::physics::RayShapePtr ray = boost::dynamic_pointer_cast<gazebo::physics::RayShape>(engine->CreateShape("ray", gazebo::physics::CollisionPtr()));

    ROS_INFO("Done with raytrace.");

    return true;
}

And this is the output I get:

[ INFO] [1469436211.442413136, 23.777000000]: Got QueryGazeboVisibility service request.
[ INFO] [1469436211.442564214, 23.777000000]: Model name seems to make sense, and points to a model of type 3 with 1 children.
[ INFO] [1469436211.442641962, 23.777000000]: Done with raytrace.
Segmentation fault (core dumped)
[gazebo-3] process has died [pid 21534, exit code 139, cmd /opt/ros/indigo/lib/gazebo_ros/gzserver -e ode /home/winkler/ros/new_cram/ros_top/src/longterm_fetch_and_place/ltfnp_gazebo/worlds/empty_world_with_plugins.world __name:=gazebo __log:=/home/winkler/.ros/log/67e1fc70-5241-11e6-828b-50465d9e262d/gazebo-3.log].
log file: /home/winkler/.ros/log/67e1fc70-5241-11e6-828b-50465d9e262d/gazebo-3*.log

The segmentation fault doesn't come up when the dynamic_pointer_cast line is commented out.

@fairlight1337
Copy link
Contributor Author

I just tested your fix-to-be from 93fa6dc. Sadly, to no avail; the behavior stays the same.

I'll gdb the shat out of this now.

@fairlight1337
Copy link
Contributor Author

From this site I just learned that you can set certain command line parameters for Gazebo when using roslaunch, including debug which will run gzserver within gdb. So exactly what we need.

Turns out that this parameter is being set in the ltfnp_gazebo launch file:

  <include file="$(find gazebo_ros)/launch/empty_world.launch">
    [...]
    <arg name="debug" value="false"/>
  </include>

So setting that to true allows to debug with gdb. Here's the backtrace:

[ INFO] [1469520316.803993641, 22.316000000]: Got QueryGazeboVisibility service request.
[ INFO] [1469520316.804152409, 22.316000000]: Model name seems to make sense, and points to a model of type 3 with 1 children.
[ INFO] [1469520316.804253991, 22.316000000]: Model bounding box 1.450000 0.350004 0.950000 0.100012 0.070016 0.200001
[ INFO] [1469520316.804331777, 22.316000000]: Got a grid of points to test.
[ INFO] [1469520316.818903733, 22.328000000]: Done with raytrace.

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fff43fff700 (LWP 18298)]
0x0000000000000000 in ?? ()
(gdb) bt
#0  0x0000000000000000 in ?? ()
#1  0x00007fffeca6d015 in ?? () from /usr/lib/x86_64-linux-gnu/libgazebo_ode.so.2
#2  0x00007fffeca6d5e4 in dxHashSpace::collide(void*, void (*)(void*, dxGeom*, dxGeom*)) () from /usr/lib/x86_64-linux-gnu/libgazebo_ode.so.2
#3  0x00007ffff24e82b9 in gazebo::physics::ODEPhysics::UpdateCollision() () from /usr/lib/x86_64-linux-gnu/libgazebo_physics_ode.so.2
#4  0x00007ffff738feb8 in gazebo::physics::World::Update() () from /usr/lib/x86_64-linux-gnu/libgazebo_physics.so.2
#5  0x00007ffff739ae17 in gazebo::physics::World::Step() () from /usr/lib/x86_64-linux-gnu/libgazebo_physics.so.2
#6  0x00007ffff739b335 in gazebo::physics::World::RunLoop() () from /usr/lib/x86_64-linux-gnu/libgazebo_physics.so.2
#7  0x00007ffff5be8a4a in ?? () from /usr/lib/x86_64-linux-gnu/libboost_thread.so.1.54.0
#8  0x00007ffff6a1b182 in start_thread (arg=0x7fff43fff700) at pthread_create.c:312
#9  0x00007ffff4f8647d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
(gdb) 

It looks like this is a collision issue after all, not being ros::spin() related. Any ideas on this, @mpomarlan?

@fairlight1337
Copy link
Contributor Author

Alright, I tried the new fix-attempt branch from your fork. There is good news and bad news.

First, the good news: The segmentation fault is gone. I tried it a couple times with dozens of request per trial; not a single crash. So that's good.

Now the bad news: It hangs. After about 30 calls (in a row if you prefer, but one call takes around 0.3 seconds so that shouldn't matter) the plugin doesn't answer queries anymore. This just affects that plugin, all other functionality still works fine.

When trying to repeatedly calling it afterwards, it looks like this in the debug output:

[ INFO] [1470386101.915959447, 37.192000000]: Got a grid of points to test.
[ INFO] [1470386101.936171011, 37.199000000]: Done with raytrace.
[ INFO] [1470386101.938524780, 37.200000000]: Got QueryGazeboVisibility service request.
[ INFO] [1470386101.952397042, 37.209000000]: Model name seems to make sense, and points to a model of type 3 with 1 children.
[ INFO] [1470386101.952500186, 37.209000000]: Model bounding box 1.349991 1.000006 0.950000 0.100001 0.070002 0.200001
[ INFO] [1470386101.952577528, 37.209000000]: Got a grid of points to test.
[ INFO] [1470386101.952783677, 37.209000000]: Done with raytrace.
[ INFO] [1470386101.955551413, 37.211000000]: Got QueryGazeboVisibility service request.
[ INFO] [1470386101.966030682, 37.219000000]: Model name seems to make sense, and points to a model of type 3 with 1 children.
[ INFO] [1470386101.966196607, 37.219000000]: Model bounding box 1.400000 0.500000 0.861319 0.261697 0.261697 0.022638
[ INFO] [1470386101.966270937, 37.219000000]: Got a grid of points to test.
[ INFO] [1470386101.993250754, 37.233000000]: Done with raytrace.
[ INFO] [1470386102.025963763, 37.251000000]: Got QueryGazeboVisibility service request.
[ INFO] [1470386323.235008262, 37.252000000]: Got QueryGazeboVisibility service request.
[ INFO] [1470386327.067720135, 37.252000000]: Got QueryGazeboVisibility service request.
[ INFO] [1470386329.833956918, 37.252000000]: Got QueryGazeboVisibility service request.

Which is odd, as the service request apparently came in but isn't processed. The next processed line in the gazebo_visibility_ros.cpp file is

physics::ModelPtr model = world->GetModel(request.name);

Any idea why this could hang at all?

@fairlight1337
Copy link
Contributor Author

By the way, for the record, I'm calling it with this code:

(top-level
  (with-designators ((obj :object nil))
    (loop for i from 1 to 100 do
      (format t "~a~%" i)
      (gazebo-perception-pm::find-with-designator obj))))

With filtered-model-names enabled in gazebo_perception_process_module.

@fairlight1337
Copy link
Contributor Author

Okay, with the new fix attempt from https://github.com/mpomarlan/cram_gazebo/tree/fix-attempt it definitely doesn't segfault anymore. I ran it a couple hundred times; no hangs, no segfaults, and always a clean return state.

But now, I don't get any objects back; are you returning all objects that are in the field of view? Or just the ones specifically named?

@fairlight1337
Copy link
Contributor Author

So here is what my PR2 sees (plus the scene; excuse the messiness):
selection_001

Upon querying the visibility, I get thes return results for my objects:

RedMetalPlate0: [QUERYGAZEBOVISIBILITY-RESPONSE
                   VISIBLE:
                     0]
RedMetalBowl0: [QUERYGAZEBOVISIBILITY-RESPONSE
                  VISIBLE:
                    0]
RedMetalCup0: [QUERYGAZEBOVISIBILITY-RESPONSE
                 VISIBLE:
                   0]
Milk0: [QUERYGAZEBOVISIBILITY-RESPONSE
          VISIBLE:
            0]
RedMetalPlate1: [QUERYGAZEBOVISIBILITY-RESPONSE
                   VISIBLE:
                     0]
RedMetalBowl1: [QUERYGAZEBOVISIBILITY-RESPONSE
                  VISIBLE:
                    0]
RedMetalCup1: [QUERYGAZEBOVISIBILITY-RESPONSE
                 VISIBLE:
                   0]
Milk1: [QUERYGAZEBOVISIBILITY-RESPONSE
          VISIBLE:
            0]

I see that per default, the camera pose is calculated from

(cl-transforms-stamped:lookup-transform
  (ensure-tf-listener) "odom_combined" "head_tilt_link"
  :timeout 2.0)

Is this correct? I did that here, according to your example: https://github.com/cram2/cram_gazebo/blob/master/gazebo_perception_process_module/src/process-module.lisp#L118

Thanks

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

2 participants