PersistentDict: fixes/workarounds for #227 #228

matthiasdiener · 2024-06-01T16:32:32Z

Should address issues in #227.

matthiasdiener · 2024-06-01T16:40:21Z

Needs more testing on the beers, but otherwise ready for a first look @inducer

alexfikl · 2024-06-02T16:51:56Z

pytools/persistent_dict.py

@@ -460,7 +460,8 @@ def __init__(self, identifier: str,

        # isolation_level=None: enable autocommit mode
        # https://www.sqlite.org/lang_transaction.html#implicit_versus_explicit_transactions
-        self.conn = sqlite3.connect(self.filename, isolation_level=None)
+        self.conn = sqlite3.connect(self.filename, isolation_level=None,
+                                    timeout=60)


If I understand correctly, the default timeout is 5s and this increases it to 60s. In a concurrent setting, that just means that it'll wait longer for some other process to let go of the lock, right?

Why does that happen at all (i.e. 5s seems like plenty of time)? Are many processes continuously writing to the cache? This mostly feels like a workaround, but I haven't debugged things, so not sure :\

If I understand correctly, the default timeout is 5s and this increases it to 60s. In a concurrent setting, that just means that it'll wait longer for some other process to let go of the lock, right?

Yes, this is my understanding as well.

Why does that happen at all (i.e. 5s seems like plenty of time)? Are many processes continuously writing to the cache? This mostly feels like a workaround, but I haven't debugged things, so not sure :\

We aren't sure how this dictionary will be used downstream, it could be that thousands of processes are hitting the same dict at the same time. This change restores the timeout from the previous implementation:

pytools/pytools/persistent_dict.py

Lines 135 to 136 in f084669

# Exit after 60 seconds if not able to acquire lock

exit_attempts = int(60/wait_time_seconds)

We aren't sure how this dictionary will be used downstream,

That's fair, but we mostly know how pyopencl uses it. Why does it fail there in the tests? My understanding is that the tests run with pytest-xdist with -n 4, so it shouldn't have that many concurrent accesses. Does it?

it could be that thousands of processes are hitting the same dict at the same time. This change restores the timeout from the previous implementation:

Hmm, I don't think that sqlite is meant to be that concurrent in writes. Will that work reasonably well?

We aren't sure how this dictionary will be used downstream,

That's fair, but we mostly know how pyopencl uses it. Why does it fail there in the tests? My understanding is that the tests run with pytest-xdist with -n 4, so it shouldn't have that many concurrent accesses. Does it?

We are still trying to figure out the reason for the slowness observed in #227. So far, this seems to only affect the beers + @inducer's laptop, which is why I didn't see this issue earlier (I had tested on my Macs, Lassen, as well as GitHub CI, which are all 2 orders of magnitude faster than the beers in these tests).

it could be that thousands of processes are hitting the same dict at the same time. This change restores the timeout from the previous implementation:

Hmm, I don't think that sqlite is meant to be that concurrent in writes. Will that work reasonably well?

It can be a matter of the right configuration, but people have reported thousands of reads/writes per second with sqlite (see e.g. https://www.reddit.com/r/golang/comments/16xswxd/comment/k34ppfo/)

Note that:

WAL mode doesn't seem to make a huge difference for the slow tests

It seems not to be a concurrency issue, the tests are slow even when running just a single test (with single writes/reads)

pytools/persistent_dict.py

Co-authored-by: Alex Fikl <[email protected]>

matthiasdiener · 2024-06-10T19:41:48Z

Closing since #231 #229 have been merged.

fixes/workarounds for inducer#227

182fee7

matthiasdiener changed the title ~~fixes/workarounds for #227~~ PersistentDict: fixes/workarounds for #227 Jun 1, 2024

smarter WOPD.store

04e59a8

matthiasdiener force-pushed the sqlite-fixes branch from 4e3a2e5 to 04e59a8 Compare June 1, 2024 16:34

matthiasdiener marked this pull request as ready for review June 1, 2024 16:40

alexfikl reviewed Jun 2, 2024

View reviewed changes

pytools/persistent_dict.py Outdated Show resolved Hide resolved

use constant

9515943

Co-authored-by: Alex Fikl <[email protected]>

matthiasdiener force-pushed the sqlite-fixes branch from ad0dca5 to 9515943 Compare June 2, 2024 17:02

matthiasdiener marked this pull request as draft June 4, 2024 21:43

matthiasdiener closed this Jun 10, 2024

matthiasdiener deleted the sqlite-fixes branch June 10, 2024 19:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PersistentDict: fixes/workarounds for #227 #228

PersistentDict: fixes/workarounds for #227 #228

matthiasdiener commented Jun 1, 2024 •

edited

Loading

matthiasdiener commented Jun 1, 2024

alexfikl Jun 2, 2024

matthiasdiener Jun 2, 2024

alexfikl Jun 2, 2024

matthiasdiener Jun 2, 2024 •

edited

Loading

matthiasdiener commented Jun 10, 2024

	# Exit after 60 seconds if not able to acquire lock
	exit_attempts = int(60/wait_time_seconds)

PersistentDict: fixes/workarounds for #227 #228

PersistentDict: fixes/workarounds for #227 #228

Conversation

matthiasdiener commented Jun 1, 2024 • edited Loading

matthiasdiener commented Jun 1, 2024

alexfikl Jun 2, 2024

Choose a reason for hiding this comment

matthiasdiener Jun 2, 2024

Choose a reason for hiding this comment

alexfikl Jun 2, 2024

Choose a reason for hiding this comment

matthiasdiener Jun 2, 2024 • edited Loading

Choose a reason for hiding this comment

matthiasdiener commented Jun 10, 2024

matthiasdiener commented Jun 1, 2024 •

edited

Loading

matthiasdiener Jun 2, 2024 •

edited

Loading