-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PersistentDict: fixes/workarounds for #227 #228
Conversation
4e3a2e5
to
04e59a8
Compare
Needs more testing on the beers, but otherwise ready for a first look @inducer |
@@ -460,7 +460,8 @@ def __init__(self, identifier: str, | |||
|
|||
# isolation_level=None: enable autocommit mode | |||
# https://www.sqlite.org/lang_transaction.html#implicit_versus_explicit_transactions | |||
self.conn = sqlite3.connect(self.filename, isolation_level=None) | |||
self.conn = sqlite3.connect(self.filename, isolation_level=None, | |||
timeout=60) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand correctly, the default timeout is 5s and this increases it to 60s. In a concurrent setting, that just means that it'll wait longer for some other process to let go of the lock, right?
Why does that happen at all (i.e. 5s seems like plenty of time)? Are many processes continuously writing to the cache? This mostly feels like a workaround, but I haven't debugged things, so not sure :\
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand correctly, the default timeout is 5s and this increases it to 60s. In a concurrent setting, that just means that it'll wait longer for some other process to let go of the lock, right?
Yes, this is my understanding as well.
Why does that happen at all (i.e. 5s seems like plenty of time)? Are many processes continuously writing to the cache? This mostly feels like a workaround, but I haven't debugged things, so not sure :\
We aren't sure how this dictionary will be used downstream, it could be that thousands of processes are hitting the same dict at the same time. This change restores the timeout from the previous implementation:
pytools/pytools/persistent_dict.py
Lines 135 to 136 in f084669
# Exit after 60 seconds if not able to acquire lock | |
exit_attempts = int(60/wait_time_seconds) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We aren't sure how this dictionary will be used downstream,
That's fair, but we mostly know how pyopencl
uses it. Why does it fail there in the tests? My understanding is that the tests run with pytest-xdist
with -n 4
, so it shouldn't have that many concurrent accesses. Does it?
it could be that thousands of processes are hitting the same dict at the same time. This change restores the timeout from the previous implementation:
Hmm, I don't think that sqlite is meant to be that concurrent in writes. Will that work reasonably well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We aren't sure how this dictionary will be used downstream,
That's fair, but we mostly know how
pyopencl
uses it. Why does it fail there in the tests? My understanding is that the tests run withpytest-xdist
with-n 4
, so it shouldn't have that many concurrent accesses. Does it?
We are still trying to figure out the reason for the slowness observed in #227. So far, this seems to only affect the beers + @inducer's laptop, which is why I didn't see this issue earlier (I had tested on my Macs, Lassen, as well as GitHub CI, which are all 2 orders of magnitude faster than the beers in these tests).
it could be that thousands of processes are hitting the same dict at the same time. This change restores the timeout from the previous implementation:
Hmm, I don't think that sqlite is meant to be that concurrent in writes. Will that work reasonably well?
It can be a matter of the right configuration, but people have reported thousands of reads/writes per second with sqlite (see e.g. https://www.reddit.com/r/golang/comments/16xswxd/comment/k34ppfo/)
Note that:
- WAL mode doesn't seem to make a huge difference for the slow tests
- It seems not to be a concurrency issue, the tests are slow even when running just a single test (with single writes/reads)
Co-authored-by: Alex Fikl <[email protected]>
ad0dca5
to
9515943
Compare
Should address issues in #227.