Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minor fixup to Gentle Close PR: Wait until the worker closes itself, before force-closing it in response to the closed pipes. #155

Open
wants to merge 2 commits into
base: npr-gentle-close-workers
Choose a base branch
from

Conversation

NHDaly
Copy link
Member

@NHDaly NHDaly commented Apr 19, 2024

Merges into #151.

When we request a worker to gracefully close, the worker closes its pipes and then shutsdown.

Without this small commit, the coordinator would detect the closed pipes, and call terminate! in response.

This change tells the coordinator: "Hey, don't worry, we asked it to shut down! Just chill out, and wait for it to go to sleep peacefully, my friend."

BUT due to JuliaLang/julia#54145, we have to introduce a less-reliable sleep(5) on macos. :/
That should be long enough to cover it though, and even if this is too slow, all that will happen is the worker will die with SIG15 instead of exiting 0.

@NHDaly NHDaly changed the base branch from main to npr-gentle-close-workers April 19, 2024 16:31
if w.gracefully_closed
@static if !Sys.isapple()
# NOTE: THIS CAUSES A HANG ON MACOS! We can remove this if-check once
# the julia issue is resolved:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

link to Julia issue #?

Copy link
Collaborator

@quinnj quinnj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM; it would be great to sync these changes to ConcurrentUtilities.jl as well!

@quinnj
Copy link
Collaborator

quinnj commented Apr 20, 2024

Hmmm, macos 1.9/1.10 failures are both on the test retrying failing testitem tests, which do involve having workers die and re-trying the tests, so possibly related? Probably worth trying to reproduce locally to confirm.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants