-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Increased Failure Rate of Gazebo Service Calls During Long Simulations #564
Comments
Thanks for the detailed investigation into this issue. We're taking a look at the problem. However, it looks like this is a duplicate of #562. If so, do you mind closing one of them? |
Is there any change you could share the code you use to reproduce this issue? Thanks! |
Yes, sure. These folder contains the files that I used for the environment: This is the bash script that I used to start/stop the simulation and move the robot and the objects: #!/bin/bash
START_TIME=$(date +%s)
log() {
CURRENT_TIME=$(date +%s)
ELAPSED=$((CURRENT_TIME - START_TIME))
HOURS=$((ELAPSED / 3600))
MINUTES=$(((ELAPSED % 3600) / 60))
SECONDS=$((ELAPSED % 60))
printf "[$(date +"%H:%M:%S")][%02d:%02d:%02d] %s\n" "$HOURS" "$MINUTES" "$SECONDS" "$1"
}
# SIMPLE SCRIPT TO MOVE CONTINUOSLY OBJECTS AND ROBOT IN SIMULATION
#simulation_id=0
COUNTER=0 # counter for the call to ign services
#export ROS_DOMAIN_ID=$simulation_id && export IGN_PARTITION=$simulation_id
# export IGN_VERBOSE=1 # print more debugging info
#export IGN_IP=127.0.0.1 # FIXME not sure if it fixes the problem of the export Exception sending a multicast message:Network is unreachable
# this command should not be necessary
fastdds shm clean # https://github.com/eProsima/Fast-DDS/issues/2003#issuecomment-1160245640
log ""
while true; do
#ros2 param set /ann_controller genome_id test
#ros2 param set /ann_controller update_genome_id True
COUNTER=$((COUNTER + 1))
log "call $COUNTER"
log "[DEBUG] Start simulation"
ign service -s /world/arena/control --reqtype ignition.msgs.WorldControl --reptype ignition.msgs.Boolean --timeout 5000 --req 'pause: true'
COUNTER=$((COUNTER + 1))
log "call $COUNTER"
log "[DEBUG] Respawn turtlebot"
# ign service -s /world/arena/set_pose --reqtype ignition.msgs.Pose --reptype ignition.msgs.Boolean --req "name: 'turtlebot4' position: { x: 1, y: 1, z: 1 } orientation: { x: 0, y: 0, z: 0, w: 1}" --timeout 2000
ign service -s /world/arena/set_pose --reqtype ignition.msgs.Pose --reptype ignition.msgs.Boolean --req 'name: "turtlebot4" position: { x: 1.0, y: 1.0, z: 0.2 } orientation: { x: 0.0, y: 0.0, z: 0.0, w: 1.0 }' --timeout 5000
sleep 1
COUNTER=$((COUNTER + 1))
log "call $COUNTER"
log "[DEBUG] Respawn object1"
ign service -s /world/arena/set_pose --reqtype ignition.msgs.Pose --reptype ignition.msgs.Boolean --req "name: 'object1' position: { x: -1.5, y: 3, z: 2} orientation: { x: 0.0, y: 0.0, z: 0.0, w: 1.0 }" --timeout 5000
sleep 1
COUNTER=$((COUNTER + 1))
log "call $COUNTER"
log "[DEBUG] Respawn object2"
ign service -s /world/arena/set_pose --reqtype ignition.msgs.Pose --reptype ignition.msgs.Boolean --req "name: 'object2' position: { x: -1.2, y: 3, z: 2} orientation: { x: 0.0, y: 0.0, z: 0.0, w: 1.0 }" --timeout 5000
sleep 1
COUNTER=$((COUNTER + 1))
log "call $COUNTER"
log "[DEBUG] Respawn object3"
ign service -s /world/arena/set_pose --reqtype ignition.msgs.Pose --reptype ignition.msgs.Boolean --req "name: 'object3' position: { x: -0.8, y: 3, z: 2} orientation: { x: 0.0, y: 0.0, z: 0.0, w: 1.0 }" --timeout 5000
sleep 1
COUNTER=$((COUNTER + 1))
log "call $COUNTER"
log "[DEBUG] Respawn object4"
ign service -s /world/arena/set_pose --reqtype ignition.msgs.Pose --reptype ignition.msgs.Boolean --req "name: 'object4' position: { x: 0.2, y: 3, z: 2} orientation: { x: 0.0, y: 0.0, z: 0.0, w: 1.0 }" --timeout 5000
sleep 1
COUNTER=$((COUNTER + 1))
log "call $COUNTER"
log "[DEBUG] Respawn object5"
ign service -s /world/arena/set_pose --reqtype ignition.msgs.Pose --reptype ignition.msgs.Boolean --req "name: 'object5' position: { x: 0.8, y: 3, z: 2} orientation: { x: 0.0, y: 0.0, z: 0.0, w: 1.0 }" --timeout 5000
sleep 1
COUNTER=$((COUNTER + 1))
log "call $COUNTER"
log "[DEBUG] Respawn object6"
ign service -s /world/arena/set_pose --reqtype ignition.msgs.Pose --reptype ignition.msgs.Boolean --req "name: 'object6' position: { x: 1.1, y: 3, z: 2} orientation: { x: 0.0, y: 0.0, z: 0.0, w: 1.0 }" --timeout 5000
sleep 1
COUNTER=$((COUNTER + 1))
log "call $COUNTER"
log "[DEBUG] Respawn object7"
ign service -s /world/arena/set_pose --reqtype ignition.msgs.Pose --reptype ignition.msgs.Boolean --req "name: 'object7' position: { x: 1.3, y: 3, z: 2} orientation: { x: 0.0, y: 0.0, z: 0.0, w: 1.0 }" --timeout 5000
sleep 1
COUNTER=$((COUNTER + 1))
log "call $COUNTER"
log "[DEBUG] Stop simulation"
ign service -s /world/arena/control --reqtype ignition.msgs.WorldControl --reptype ignition.msgs.Boolean --timeout 5000 --req 'pause: false'
sleep 10
done I didn't post the codes to simulate the TurtleBot 4 robot since there are various folders to download for the simulation. |
@caguero just a follow-up. I also tried the setup where I move only the position of the objects without launching anything related to the Turtlebot 4, and the problem still arises. I also tried moving only the position of the Turtlebot 4 without moving the objects and, also in this case, the same problem occurs. |
Thanks, that's useful information. |
@paoloelle have you tried it with a simpler robot, say an |
@azeey I didn't try with a simpler robot, anyway as I mentioned here
I didn't spawn any robot at all in the test where I move only the objects. |
@azeey I tried with the |
@caguero @azeey I also tried another approach by writing two plugins that don't use services to modify the status of the simulation (paused/running) or the models' position:
I tested these new plugins in the same environment provided here. The code to automatize the pause/resume commands and to move the objects is listed below. #!/bin/bash
START_TIME=$(date +%s)
log() {
CURRENT_TIME=$(date +%s)
ELAPSED=$((CURRENT_TIME - START_TIME))
HOURS=$((ELAPSED / 3600))
MINUTES=$(((ELAPSED % 3600) / 60))
SECONDS=$((ELAPSED % 60))
printf "[$(date +"%H:%M:%S")][%02d:%02d:%02d] %s\n" "$HOURS" "$MINUTES" "$SECONDS" "$1"
}
# SIMPLE SCRIPT TO MOVE CONTINUOSLY OBJECTS AND ROBOT IN SIMULATION
#simulation_id=0
COUNTER=0 # counter for the call to ign services
#export ROS_DOMAIN_ID=$simulation_id && export IGN_PARTITION=$simulation_id
# export IGN_VERBOSE=1 # print more debugging info
#export IGN_IP=127.0.0.1 # FIXME not sure if it fixes the problem of the export Exception sending a multicast message:Network is unreachable
# this command should not be necessary
fastdds shm clean # https://github.com/eProsima/Fast-DDS/issues/2003#issuecomment-1160245640
log ""
while true; do
#ros2 param set /ann_controller genome_id test
#ros2 param set /ann_controller update_genome_id True
COUNTER=$((COUNTER + 1))
log "call $COUNTER"
log "[DEBUG] Stop simulation"
#ign service -s /world/arena/control --reqtype ignition.msgs.WorldControl --reptype ignition.msgs.Boolean --timeout 5000 --req 'pause: true'
ign topic -t /pause_resume/control -m ignition.msgs.Boolean -p "data: true"
# COUNTER=$((COUNTER + 1))
# log "call $COUNTER"
# log "[DEBUG] Respawn turtlebot"
# # ign service -s /world/arena/set_pose --reqtype ignition.msgs.Pose --reptype ignition.msgs.Boolean --req "name: 'turtlebot4' position: { x: 1, y: 1, z: 1 } orientation: { x: 0, y: 0, z: 0, w: 1}" --timeout 2000
# ign service -s /world/arena/set_pose --reqtype ignition.msgs.Pose --reptype ignition.msgs.Boolean --req 'name: "turtlebot4" position: { x: 1.0, y: 1.0, z: 0.2 } orientation: { x: 0.0, y: 0.0, z: 0.0, w: 1.0 }' --timeout 5000
# sleep 1
COUNTER=$((COUNTER + 1))
log "call $COUNTER"
log "[DEBUG] Respawn object1"
#ign service -s /world/arena/set_pose --reqtype ignition.msgs.Pose --reptype ignition.msgs.Boolean --req "name: 'object1' position: { x: -1.5, y: 3, z: 2} orientation: { x: 0.0, y: 0.0, z: 0.0, w: 1.0 }" --timeout 5000
ign topic -t /object1/pose_cmd -m ignition.msgs.Pose -p "position: {x: -1.5, y: 3.0, z: 2.0}, orientation: {w: 1.0, x: 0.0, y: 0.0, z: 0.0}"
sleep 1
COUNTER=$((COUNTER + 1))
log "call $COUNTER"
log "[DEBUG] Respawn object2"
#ign service -s /world/arena/set_pose --reqtype ignition.msgs.Pose --reptype ignition.msgs.Boolean --req "name: 'object2' position: { x: -1.2, y: 3, z: 2} orientation: { x: 0.0, y: 0.0, z: 0.0, w: 1.0 }" --timeout 5000
ign topic -t /object2/pose_cmd -m ignition.msgs.Pose -p "position: {x: -1.2, y: 3.0, z: 2.0}, orientation: {w: 1.0, x: 0.0, y: 0.0, z: 0.0}"
sleep 1
COUNTER=$((COUNTER + 1))
log "call $COUNTER"
log "[DEBUG] Respawn object3"
#ign service -s /world/arena/set_pose --reqtype ignition.msgs.Pose --reptype ignition.msgs.Boolean --req "name: 'object3' position: { x: -0.8, y: 3, z: 2} orientation: { x: 0.0, y: 0.0, z: 0.0, w: 1.0 }" --timeout 5000
ign topic -t /object3/pose_cmd -m ignition.msgs.Pose -p "position: {x: -0.8, y: 3.0, z: 2}, orientation: {w: 1.0, x: 0.0, y: 0.0, z: 0.0}"
sleep 1
COUNTER=$((COUNTER + 1))
log "call $COUNTER"
log "[DEBUG] Respawn object4"
#ign service -s /world/arena/set_pose --reqtype ignition.msgs.Pose --reptype ignition.msgs.Boolean --req "name: 'object4' position: { x: 0.2, y: 3, z: 2} orientation: { x: 0.0, y: 0.0, z: 0.0, w: 1.0 }" --timeout 5000
ign topic -t /object4/pose_cmd -m ignition.msgs.Pose -p "position: {x: 0.2, y: 3.0, z: 2}, orientation: {w: 1.0, x: 0.0, y: 0.0, z: 0.0}"
sleep 1
COUNTER=$((COUNTER + 1))
log "call $COUNTER"
log "[DEBUG] Respawn object5"
#ign service -s /world/arena/set_pose --reqtype ignition.msgs.Pose --reptype ignition.msgs.Boolean --req "name: 'object5' position: { x: 0.8, y: 3, z: 2} orientation: { x: 0.0, y: 0.0, z: 0.0, w: 1.0 }" --timeout 5000
ign topic -t /object5/pose_cmd -m ignition.msgs.Pose -p "position: {x: 0.8, y: 3.0, z: 2}, orientation: {w: 1.0, x: 0.0, y: 0.0, z: 0.0}"
sleep 1
COUNTER=$((COUNTER + 1))
log "call $COUNTER"
log "[DEBUG] Respawn object6"
#ign service -s /world/arena/set_pose --reqtype ignition.msgs.Pose --reptype ignition.msgs.Boolean --req "name: 'object6' position: { x: 1.1, y: 3, z: 2} orientation: { x: 0.0, y: 0.0, z: 0.0, w: 1.0 }" --timeout 5000
ign topic -t /object6/pose_cmd -m ignition.msgs.Pose -p "position: {x: 1.1, y: 3.0, z: 2}, orientation: {w: 1.0, x: 0.0, y: 0.0, z: 0.0}"
sleep 1
COUNTER=$((COUNTER + 1))
log "call $COUNTER"
log "[DEBUG] Respawn object7"
#ign service -s /world/arena/set_pose --reqtype ignition.msgs.Pose --reptype ignition.msgs.Boolean --req "name: 'object7' position: { x: 1.3, y: 3, z: 2} orientation: { x: 0.0, y: 0.0, z: 0.0, w: 1.0 }" --timeout 5000
ign topic -t /object7/pose_cmd -m ignition.msgs.Pose -p "position: {x: 1.3, y: 3.0, z: 2}, orientation: {w: 1.0, x: 0.0, y: 0.0, z: 0.0}"
sleep 1
COUNTER=$((COUNTER + 1))
log "call $COUNTER"
log "[DEBUG] Start simulation"
#ign service -s /world/arena/control --reqtype ignition.msgs.WorldControl --reptype ignition.msgs.Boolean --timeout 5000 --req 'pause: false'
ign topic -t /pause_resume/control -m ignition.msgs.Boolean -p "data: false"
sleep 10
done The problem is that, even in this case, the simulation stops working after a while (I don’t have detailed results yet). I guess the problem might lie in something unrelated to the service call. |
Thanks for all the work on this issue @paoloelle . Do you think you could try to reproduce this issue using an example that only uses |
@caguero thanks for the reply. As mentioned here
I just moved the position of the objects using service calls and without spawning any robot. If there is other test that I can run please let me know. |
What I mean is an example that doesn't require to load Gazebo. |
OK, I got it. I will work on it. Anyway, I tried to simulate using a new version of these two plugins that I implemented. The one named PauseResume stops the simulation and allows it to run for specified time intervals (without using any information from outside the simulation). The same applies to the MoveModel plugin, where the position of the objects is modified every time the simulation is stopped. This is a modified version of the same plugins mentioned here. So, my guess is that once I start using some form of communication (e.g., topic publishing or a service call), something breaks at some point. |
@caguero could you please provide me a link with an example or something that can help me implement this test? |
Probably the easiest thing is to look at the examples directory: https://github.com/gazebosim/gz-transport/tree/gz-transport14/example |
I just implemented a modified version of the /*
* Copyright (C) 2014 Open Source Robotics Foundation
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*
*/
#include <iostream>
#include <ignition/msgs.hh>
#include <gz/transport.hh>
//////////////////////////////////////////////////
int main(int argc, char **argv)
{
// Create a transport node.
gz::transport::Node node;
// Prepare the input parameters.
gz::msgs::StringMsg req;
req.set_data("HELLO");
gz::msgs::StringMsg rep;
bool result;
unsigned int timeout = 5000;
int requestCount = 0;
while (true)
{
// Request the "/echo" service.
bool executed = node.Request("/echo", req, timeout, rep, result);
if (executed)
{
if (result)
std::cout << "Response: [" << rep.data() << "]" << std::endl;
else
std::cout << "Service call failed" << std::endl;
}
else
std::cerr << "Service call timed out" << std::endl;
// Increment the request counter.
requestCount++;
std::cout << "Number of requests made: " << requestCount << std::endl;
// Add a small delay to avoid spamming the service.
std::this_thread::sleep_for(std::chrono::milliseconds(1000));
}
} After >17000 service requests I don't see any errors in general nor errors like Could there be a performance difference between calling a service from the terminal using a |
Follow up of the previous answer. If I try to call the service from the following #!/bin/bash
INTERVAL=1
START_TIME=$(date +%s)
log() {
CURRENT_TIME=$(date +%s)
ELAPSED=$((CURRENT_TIME - START_TIME))
HOURS=$((ELAPSED / 3600))
MINUTES=$(((ELAPSED % 3600) / 60))
SECONDS=$((ELAPSED % 60))
printf "[$(date +"%H:%M:%S")][%02d:%02d:%02d] %s\n" "$HOURS" "$MINUTES" "$SECONDS" "$1"
}
COUNTER=0 # counter for the call to ign services
while true; do
COUNTER=$((COUNTER + 1))
log "call $COUNTER"
# Call ign service
ign service -s /echo --reqtype ignition.msgs.StringMsg --reptype ignition.msgs.StringMsg --req 'data: "Hello from CLI"' --timeout 5000
sleep $INTERVAL
done I can notice that the |
Testing it right now... |
I was able to reproduce it on Gazebo Ionic. After a few hundreds I start seeing from time to time:
|
Could be somehow related to #243? More specifically to your answer @caguero? In the documentation, the description related to the parameters and for Might be that after too many service call there is some buffer that is full and drops data? |
Anyway is interesting the fact that if I run this code, or a slightly modified version of it, from the terminal everything works fine. If I execute the same code from a |
The issue could be in the creation of the communication node when a service is called. In the sense that, in this file, I create the node only once and then iterate the service calls within the while loop, whereas in other cases, by using this script, I create the communication node every time. Could it be something reasonable? |
@paoloelle I'm having somewhat same issue. I have two service which resets the gazebo world as well as resets the pose of my vehicle. I am accessing this service repeatedly for my RL thing. However after some episodes the gazebo simulation stops/ breaks abruptly. |
@ChinmayMundane my problem is that after a certain point, the message |
Environment
Description
I'm running an evolutionary algorithm using ROS 2 Humble and Gazebo Fortress (as I said before, I also tested it with ROS 2 Jazzy and Gazebo Harmonic via Apptainer containerization). To do this, I need to call Gazebo services to stop/start the simulation and move various objects in the environment, including my robot. What I noticed is that after calling these commands, the simulations' performance slows down, especially because the service call often fails. However, the real-time factor doesn't decrease.
To test this undesired behavior, I implemented a simple simulation scenario, without any evolutionary algorithm on top of it. This simulation consists of just stopping the simulation, moving the robot and the objects, and restarting the simulation for 10 seconds. An example is shown in the video below.
simple_sim-2024-12-20_09.56.47.mp4
I let this simple simulation run for 24 hours and plotted the related results.
These show the total number of service calls for each hour of simulation, the service calls that didn't give back a successful answer, and the percentage of service calls that gave a successful answer. Please ignore the hour 25.
The text was updated successfully, but these errors were encountered: