Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

publish clock after delay is over and disable delay on next loops #1861

Open
wants to merge 3 commits into
base: rolling
Choose a base branch
from

Conversation

nicolaloi
Copy link
Contributor

@nicolaloi nicolaloi commented Nov 21, 2024

clock_publish_timer is now initialized with autostart = false, so it doesn't immediately start publishing the /clock topic. It is "activated" later with clock_publish_timer->reset() at the beginning of the playback thread, ensuring that the (optional) user-defined delay period has passed

Additionally, the clock_ now starts in a paused state if a delay time is specified by the user. It resumes after the delay period has ended.

@nicolaloi nicolaloi requested a review from a team as a code owner November 21, 2024 23:12
@nicolaloi nicolaloi requested review from MichaelOrlov and hidmic and removed request for a team November 21, 2024 23:12
Copy link
Contributor

@fujitatomoya fujitatomoya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm with green CI.

@MichaelOrlov could you have 2nd review?

Copy link
Contributor

@MichaelOrlov MichaelOrlov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nicolaloi Thanks for the PR. Overall looks good among one undefined behavior / race introducing with the

            if (!play_options_.start_paused) {
              clock_->resume();
            }

in Play method.

Comment on lines 492 to 494
if (!play_options_.start_paused) {
clock_->resume();
}
Copy link
Contributor

@MichaelOrlov MichaelOrlov Nov 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can predict that after some time someone will complain and raise a new issue that when using a player with --loop and --start-paused delay options and sending the pause service call request to the player somewhere at the end of the playback the request is ignored and player still will continue to playback from the beginning.
And we will have a hard time to debug this race condition, because of this resume call.
cc: @fujitatomoya @hayato-m126

@nicolaloi
Copy link
Contributor Author

nicolaloi commented Nov 23, 2024

Ah yes, the clock will resume automatically at the next loop regardless of user requests.

Then to not overcomplicate things, I would suggest removing the "pause clock until delay is over" feature from this PR, and keeping the "publish clock after the delay is over" feature only, which is the actual fix for issue #1858. Pausing the clock is not required to fix this specific issue since anyway the clock_publish_timer_ was starting to publish after the clock was resumed.

I was pausing the clock as a comprehensive "inner" change for the Player class when in delayed status, but it is not required for the "outer" behavior of the timed published clock. If this "pause clock until delay is over" feature is desired, it can be a separate PR.

@nicolaloi nicolaloi changed the title publish clock after the delay is over and pause clock until delay is over publish clock after the delay is over Nov 23, 2024
@nicolaloi
Copy link
Contributor Author

I was pausing the clock as a comprehensive "inner" change for the Player class when in delayed status, but it is not required for the "outer" behavior of the timed published clock. If this "pause clock until delay is over" feature is desired, it can be a separate PR.

Removing

if (!play_options_.start_paused) {
       clock_->resume();
}

to keep the "pause clock until delay is over" feature only, but if it is not okay I can reintegrate and try to fix this "pause clock until delay is over" feature in this PR too.

Extra change: now before sleeping for the delay period, the clock_publish_timer_->cancel() is called. This does not affect the first standard loop (since clock_publish_timer_ has autostart=false), however for the subsequent (optional) loops this prevents the clock from being published until the player exits again the subsequent delayed periods (to be consistent with the first delay period).

@fujitatomoya
Copy link
Contributor

keeping the "publish clock after the delay is over" feature only, which is the actual fix for issue #1858

i prefer this approach, one thing at a time. and easier to backport to downstream distros.

and the question here is that clock needs to be paused in delay duration? i am not quite sure about what state delay is...
in the implementation, it is playing state in PlayerImpl::play during duration period, but we cannot pause the sleep duration at all.

Copy link
Contributor

@MichaelOrlov MichaelOrlov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nicolaloi Thanks for addressing the comments from the review.
It looks good to me in the current form. However, I would move the delay out of the playback loop. IMO, there is no value in making a delay in the loop after we have finished the playback round.
Delay needs for the purporse to be able to wait for other nodes (subscribers and other dependend nodes in the pipeline) to start and be discoverable. When we have run the first playback round all other nodes still up un running. Thereofre, there is no need to make a delay on a consequent rounds of replay.

@nicolaloi
Copy link
Contributor Author

nicolaloi commented Nov 24, 2024

Okay, I will incorporate the change to the delay loop behavior in this PR.

Need to invoke clock_->jump(starting_time_) before the loop, before starting the clock_publish_timer, but I am not sure if locking the mutex during the call is needed also in this pre-loop case.

@nicolaloi nicolaloi changed the title publish clock after the delay is over publish clock after delay is over and disable delay on next loops Nov 24, 2024
Copy link
Contributor

@MichaelOrlov MichaelOrlov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nicolaloi Thanks for the updates.
I think now it would be safe to init clock in pause mode in constructor and resume it after the delay. i.e.

 clock_ = std::make_unique<rosbag2_cpp::TimeControllerClock>(
      starting_time_, std::chrono::steady_clock::now,
      std::chrono::milliseconds{100},
      play_options_.start_paused || play_options_.delay > rclcpp::Duration(0, 0));

Also may I ask you in future do a separate commit after a round of review instead of push force the branch with the the same changed commit?
We are squashing all commits before merging to the target branch after review anyway automatically. However, the separate commits allow to keep history of changes during review.

Comment on lines +491 to +494
{
std::lock_guard<std::mutex> lk(reader_mutex_);
clock_->jump(starting_time_);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have already do clock_->jump(..) just a few lines down. On line 502.
This one is redundant and should be removed IMO.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I copied the jump before the loop because in this commit it needs to be called before starting the clock_publisher_timer to avoid publishing a non-monotonic clock. But yes if I revert to pausing the clock until the delay is over, then it is not needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MichaelOrlov I think there could be issues with pausing the clock and resuming it after the delay. If the user commands a direct "pause" of the clock (via keyboard control or service) during the delay period, the clock will resume automatically after the delay regardless of the user's intent (#1861 (comment)). Another linked issue is if I remove the clock_->jump call before starting clock_publisher_timer, and the user resumes the clock during the delay period; then when the delay ends and the clock_publisher_timer starts, the clock publisher could publish "future" timestamps before going back to the starting_time_ timestamp with the clock_->jump call inside the loop.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Rosbag2 play --clock option ignores delay
3 participants