Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PersistentDict: fixes/workarounds for #227 #228

Closed
wants to merge 3 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 18 additions & 7 deletions pytools/persistent_dict.py
Original file line number Diff line number Diff line change
Expand Up @@ -460,7 +460,8 @@ def __init__(self, identifier: str,

# isolation_level=None: enable autocommit mode
# https://www.sqlite.org/lang_transaction.html#implicit_versus_explicit_transactions
self.conn = sqlite3.connect(self.filename, isolation_level=None)
self.conn = sqlite3.connect(self.filename, isolation_level=None,
timeout=60)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, the default timeout is 5s and this increases it to 60s. In a concurrent setting, that just means that it'll wait longer for some other process to let go of the lock, right?

Why does that happen at all (i.e. 5s seems like plenty of time)? Are many processes continuously writing to the cache? This mostly feels like a workaround, but I haven't debugged things, so not sure :\

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, the default timeout is 5s and this increases it to 60s. In a concurrent setting, that just means that it'll wait longer for some other process to let go of the lock, right?

Yes, this is my understanding as well.

Why does that happen at all (i.e. 5s seems like plenty of time)? Are many processes continuously writing to the cache? This mostly feels like a workaround, but I haven't debugged things, so not sure :\

We aren't sure how this dictionary will be used downstream, it could be that thousands of processes are hitting the same dict at the same time. This change restores the timeout from the previous implementation:

# Exit after 60 seconds if not able to acquire lock
exit_attempts = int(60/wait_time_seconds)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We aren't sure how this dictionary will be used downstream,

That's fair, but we mostly know how pyopencl uses it. Why does it fail there in the tests? My understanding is that the tests run with pytest-xdist with -n 4, so it shouldn't have that many concurrent accesses. Does it?

it could be that thousands of processes are hitting the same dict at the same time. This change restores the timeout from the previous implementation:

Hmm, I don't think that sqlite is meant to be that concurrent in writes. Will that work reasonably well?

Copy link
Contributor Author

@matthiasdiener matthiasdiener Jun 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We aren't sure how this dictionary will be used downstream,

That's fair, but we mostly know how pyopencl uses it. Why does it fail there in the tests? My understanding is that the tests run with pytest-xdist with -n 4, so it shouldn't have that many concurrent accesses. Does it?

We are still trying to figure out the reason for the slowness observed in #227. So far, this seems to only affect the beers + @inducer's laptop, which is why I didn't see this issue earlier (I had tested on my Macs, Lassen, as well as GitHub CI, which are all 2 orders of magnitude faster than the beers in these tests).

it could be that thousands of processes are hitting the same dict at the same time. This change restores the timeout from the previous implementation:

Hmm, I don't think that sqlite is meant to be that concurrent in writes. Will that work reasonably well?

It can be a matter of the right configuration, but people have reported thousands of reads/writes per second with sqlite (see e.g. https://www.reddit.com/r/golang/comments/16xswxd/comment/k34ppfo/)

Note that:

  • WAL mode doesn't seem to make a huge difference for the slow tests
  • It seems not to be a concurrency issue, the tests are slow even when running just a single test (with single writes/reads)


self.conn.execute(
"CREATE TABLE IF NOT EXISTS dict "
Expand Down Expand Up @@ -627,12 +628,22 @@ def store(self, key: K, value: V, _skip_if_present: bool = False) -> None:
keyhash = self.key_builder(key)
v = pickle.dumps((key, value))

try:
self.conn.execute("INSERT INTO dict VALUES (?, ?)", (keyhash, v))
except sqlite3.IntegrityError:
if not _skip_if_present:
raise ReadOnlyEntryError("WriteOncePersistentDict, "
"tried overwriting key")
if _skip_if_present:
self.conn.execute("INSERT OR IGNORE INTO dict VALUES (?, ?)",
(keyhash, v))
else:
try:
self.conn.execute("INSERT INTO dict VALUES (?, ?)", (keyhash, v))
except sqlite3.IntegrityError as e:
if hasattr(e, "sqlite_errorcode"):
if e.sqlite_errorcode == sqlite3.SQLITE_CONSTRAINT_PRIMARYKEY:
raise ReadOnlyEntryError("WriteOncePersistentDict, "
"tried overwriting key")
else:
raise
else:
raise ReadOnlyEntryError("WriteOncePersistentDict, "
"tried overwriting key")

def _fetch(self, keyhash: str) -> Tuple[K, V]: # pylint:disable=method-hidden
# This method is separate from fetch() to allow for LRU caching
Expand Down
22 changes: 11 additions & 11 deletions pytools/test/test_persistent_dict.py
Original file line number Diff line number Diff line change
Expand Up @@ -729,14 +729,14 @@ def test_speed():
pdict = WriteOncePersistentDict("pytools-test", container_dir=tmpdir)

start = time.time()
for i in range(10000):
for i in range(100):
pdict[i] = i
end = time.time()
print("persistent dict write time: ", end-start)

start = time.time()
for _ in range(5):
for i in range(10000):
for i in range(100):
pdict[i]
end = time.time()
print("persistent dict read time: ", end-start)
Expand All @@ -749,12 +749,12 @@ def test_size():
tmpdir = tempfile.mkdtemp()
pdict = PersistentDict("pytools-test", container_dir=tmpdir)

for i in range(10000):
for i in range(100):
pdict[f"foobarbazfoobbb{i}"] = i

size = pdict.nbytes()
print("sqlite size: ", size/1024/1024, " MByte")
assert 1*1024*1024 < size < 2*1024*1024
assert 1*1024*1024/100 < size < 4*1024*1024/100
finally:
shutil.rmtree(tmpdir)

Expand All @@ -766,10 +766,10 @@ def test_len():

assert len(pdict) == 0

for i in range(10000):
for i in range(100):
pdict[i] = i

assert len(pdict) == 10000
assert len(pdict) == 100

pdict.clear()

Expand All @@ -793,14 +793,14 @@ def test_keys_values_items():
tmpdir = tempfile.mkdtemp()
pdict = PersistentDict("pytools-test", container_dir=tmpdir)

for i in range(10000):
for i in range(100):
pdict[i] = i

# This also tests deterministic iteration order
assert len(list(pdict.keys())) == 10000 == len(set(pdict.keys()))
assert list(pdict.keys()) == list(range(10000))
assert list(pdict.values()) == list(range(10000))
assert list(pdict.items()) == list(zip(list(pdict.keys()), range(10000)))
assert len(list(pdict.keys())) == 100 == len(set(pdict.keys()))
assert list(pdict.keys()) == list(range(100))
assert list(pdict.values()) == list(range(100))
assert list(pdict.items()) == list(zip(list(pdict.keys()), range(100)))

assert ([k for k in pdict.keys()] # noqa: C416
== list(pdict.keys())
Expand Down
Loading