Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restore mutex without compromising performance, on Windows #104

Merged
merged 3 commits into from
Nov 19, 2023
Merged

Restore mutex without compromising performance, on Windows #104

merged 3 commits into from
Nov 19, 2023

Conversation

temeddix
Copy link
Contributor

@temeddix temeddix commented Nov 18, 2023

As @dnadlinger mentioned in this post, I decided to check if the performance can be fine while still using the mutex and semaphore, with the code he provided.

This change doesn't make the code identical to that of before #101. It slightly modifies the _poll method.

The main points:

  • Restore mutex on Windows for safety against race conditions
  • Do not compromise performance

Please focus on the Median duration.

Before #101:
image

Before this change(After #101):
image

After this change:
image

The numbers(though there are small numerical errors) are telling us that restoring mutex this way doesn't negatively affect performance, but ensures safety as qasync did before #101. Please tell me if this change would be okay, @hosaka. I'm open to changes and will reply as fast as I can.

@temeddix
Copy link
Contributor Author

@dnadlinger Please tell me if you spot anything to modify more :)

@hosaka
Copy link
Collaborator

hosaka commented Nov 18, 2023

I went through the comments on #101, looks like that was an oversight on our part. @dnadlinger, do you have a suggestion for a unit test that can trigger a race condition if no locking is present?

Copy link
Contributor Author

@temeddix temeddix Nov 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On Windows, with #101, this test failed without this if statement. That's why I have set Windows as an exception of this test, regarding this as a platform limitation.

With #104, this test is working again. IMHO, we might not need to add additional test after all.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is true, but the test_regression_bug13 does not explicitly test for the race condition, rather a socket connection. I think it would be better to have a test that can cover the locking mechanism, effectively creating a race condition and verifying that it does not occur (aka having the implementation with mutexes).

@dnadlinger
Copy link
Contributor

A test case that has a good chance at reproducing the issue would probably involve I/O operations that complete very quickly. It would by necessity be indeterministic, though; perhaps you could exchange many short bursts of data over a local socket pair to increase the chance of hitting it.

The test case that was ignored in #101 is probably a good start. By the way, it sure seems like any reference to asyncio's add_reader/add_writer in the discussion there was a red herring, as the methods in question are actually defined in _QEventLoop and build on Qt functionality.

@dnadlinger
Copy link
Contributor

You might also want not to copy over my commit with the work-in-progress FIXME messages over wholesale (and without attribution, I might add), but rather just cleanly revert the changes from #101 in one commit, and then in a second just add the few-line change moving the GetQueuedCompletionStatus() call outside the loop (perhaps including a comment as to why this works). This way, it should be easier for somebody in the future to trace what the actual change here was. I can prepare a PR with just this change and a comment with a bit more of an explanation if you'd prefer.

On a related note, I still haven't look into the other issues indicated in the FIXME comments, and am unlikely to have the bandwidth for this soon, so if you wanted to spend some time investigating this, it would certainly be helpful.

@temeddix
Copy link
Contributor Author

temeddix commented Nov 18, 2023

I reverted the old commit, created a new one, and force-pushed with Git. Now the Git history will be easier to read.

I too don't have much time to inspect those 'FIXME' comments right now, so restoring those mutexes was the best I could do.

@hosaka
Copy link
Collaborator

hosaka commented Nov 19, 2023

You've left all the fixme comments, word for word what was in the commit that @dnadlinger linked. Can you please submit 2 commits: one reverting #101 like you already did, and another one moving the mutex inside of the while loop and after the GetQueuedCompletionStatus() call.

@temeddix
Copy link
Contributor Author

Sorry if I misunderstood, but are those 2 commits without FIXME comments you're talking about?

Comment on lines 143 to 149
while True:
status = _overlapped.GetQueuedCompletionStatus(self._iocp, ms)
if status is None:
break
ms = 0

with QtCore.QMutexLocker(self._lock):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only keep this in the 2nd commit please

@temeddix
Copy link
Contributor Author

Done :)

@hosaka hosaka merged commit 99dc18b into CabbageDevelopment:master Nov 19, 2023
36 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants