-
-
Notifications
You must be signed in to change notification settings - Fork 706
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix issue 17037 - std.concurrency has random segfaults #5004
Conversation
Thanks for your pull request and interest in making D better, @WalterWaldron! We are looking forward to reviewing it, and you should be hearing from a maintainer soon.
Please see CONTRIBUTING.md for more information. If you have addressed all reviews or aren't sure how to proceed, don't hesitate to ping us with a simple comment. Bugzilla references
Testing this PR locallyIf you don't have a local development environment setup, you can use Digger to test this PR: dub run digger -- build "master + phobos#5004" |
4766211
to
3bd0258
Compare
std/concurrency.d
Outdated
Thread.sleep(dur!("msecs")( 10 )); | ||
else | ||
dosleep = true; | ||
GC.collect; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A comment would be nice to explain why doing a collection helps here as it's not immediately obvious.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
std.concurrency
is designed around having a global variable (scheduler
) set once at the outset. There is no synchronization for this variable and the implementation does not appear to support changing it on the fly.
However we need to test with both implementations of Scheduler
: ThreadScheduler
and FiberScheduler
.
This function waits until it is the only thread before modifying scheduler
(i.e. it's a mutual exclusion hack.)
Collection helps because threads can wait on the finalization action of other threads (e.g. waiting for OwnerTerminated
exceptions initiated by static ~this
.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation, I actually meant a comment in the source. I suggest something like // wait for all other threads to terminate, using GC.collect to trigger finalizers which may terminate threads (e.g. OwnerTerminated or LinkTerminated)
at the top of the loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know, I was giving the explanation as interim to updating the PR.
Updated according to feedback. |
@wilzbach in the same vein as #5515 (comment) why hasn't the bot suggested a reviewer? |
@MartinNowak ping please! this has been open for 7 months! |
Same answer as in #5515 (comment) (we turned the feature of due to too much noise), but #5573 looks very promising. |
This has been all green for awhile now which I think should be a pretty good indicator that it at least shouldn't break anything, and if it does we can revert it. Unfortunately Martin is a pretty busy guy so it's hard to say when he'll get to this. |
It can't break code because it only modifies the unittests. My changes are:
|
What I was getting at is any unittest build would fail if one of the tests was broken. It's pretty common for people to run the full test suite for Phobos locally, and would also break the auto tester. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see any problems with this. I'll leave this open for two or three more days and merge if no one has any more comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How exactly does the race condition manifest?
Add changeScheduler inside version(unittest) block: This function is a hack to make changing the scheduler in unittests more sane.
From a first look, I'd say we should make the tests more sane instead.
std/concurrency.d
Outdated
Thread.sleep(dur!("msecs")( 10 )); | ||
else | ||
sleepFirst = true; | ||
GC.collect; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That looks frightening? Do we really only send Owner/LinkTerminated messages when the thread object get's collected? Sounds horribly unreliable as the other peer might hold some (implicit) reference to the thread.
If so we should add some onThreadExit hook to core.thread or wrap the thread function with some scope (exit)
guard.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really only send Owner/LinkTerminated messages when the thread object get's collected?
They get sent when the module destructor is run for threads, and via scope(exit)
for fibers, so the comment I added must be wrong.
I have look at the code again (it's been so long since I made this PR) to see whether this was just a hack for force bad tests to hang (instead of random failures,) or whether it was necessary.
|
||
changeScheduler(new ThreadScheduler); | ||
scheduler.spawn(testdg); | ||
assert(receiveOnly!bool()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Somewhat unclear, is this really the last life-signal of thread being spawned?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I made it so that the test result (failure/success) was communicated back to the main thread instead of relying on exceptions being re-thrown (like it had been previously.)
Typically it was failing like this:
The problem is that the code being tested references the global variable ( |
I've removed the |
I think it's still necessary to serialize changing the scheduler, however I don't think the
|
Is this still "frightening". More review needed? |
https://issues.dlang.org/show_bug.cgi?id=17037