Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Occasional failures in get_records_with_cache #844

Open
ntBre opened this issue Jun 25, 2024 · 1 comment
Open

Occasional failures in get_records_with_cache #844

ntBre opened this issue Jun 25, 2024 · 1 comment

Comments

@ntBre
Copy link

ntBre commented Jun 25, 2024

Describe the bug
I was following up on our discussion from the meeting today of trying to use qcportal.cache.get_records_with_cache to cache record downloads outside of datasets, and I noticed that I was occasionally getting failures with a very simple script like:

import shutil
from pathlib import Path

from qcportal import PortalClient
from qcportal.cache import RecordCache, get_records_with_cache
from qcportal.optimization import OptimizationRecord

addr = "https://api.qcarchive.molssi.org:443/"
cache_dir = Path("api.qcarchive.molssi.org_443")

if cache_dir.exists():
    shutil.rmtree(cache_dir)

client = PortalClient(addr, cache_dir=".")
record_cache = RecordCache(f"{client.cache.cache_dir}/records.sqlite", False)
r1 = get_records_with_cache(
    client, record_cache, OptimizationRecord, [137149103]
)
r2 = get_records_with_cache(
    None, record_cache, OptimizationRecord, [137149103]
)

This often leads to the error below:

Traceback (most recent call last):
  File "/home/brent/omsf/scratch/qcportal-cache/simple.py", line 19, in <module>
    get_records_with_cache(
  File "/home/brent/mambaforge/envs/qcsubmit-test-basic/lib/python3.11/site-packages/qcportal/cache.py", line 653, in get_records_with_cache
    raise RuntimeError("Need to fetch some records, but not connected to a client")
RuntimeError: Need to fetch some records, but not connected to a client

but if I perturb the script slightly (such as by deleting the r1 and r2 assignments but keeping the function calls), it will run successfully. I think this means that there is some kind of timing/concurrency issue with how the records are being committed to the database, but adding a time.sleep call between the two didn't help, so that might be totally off base.

I also tried running this code in a loop to try to see how often it failed:

import shutil
from pathlib import Path

from qcportal import PortalClient
from qcportal.cache import RecordCache, get_records_with_cache
from qcportal.optimization import OptimizationRecord

addr = "https://api.qcarchive.molssi.org:443/"
cache_dir = Path("api.qcarchive.molssi.org_443")

if cache_dir.exists():
    shutil.rmtree(cache_dir)

failed = 0
for i in range(100):
    client = PortalClient(addr, cache_dir=".")
    record_cache = RecordCache(
        f"{client.cache.cache_dir}/records.sqlite", False
    )
    r1 = get_records_with_cache(
        client, record_cache, OptimizationRecord, [137149103]
    )
    try:
        r2 = get_records_with_cache(
            None, record_cache, OptimizationRecord, [137149103]
        )
    except RuntimeError as e:
        assert "Need to fetch some records" in str(e)
        print(f"failed on iter {i}")
        failed += 1
        continue
    assert r1 == r2, f"mismatch on iter {i}"

print(failed)

But as written it always fails on the first iteration and then successfully accesses the existing cache on the other 99 iterations. However, if I move the rmtree call inside the loop, I get 100 instances of this error:

Traceback (most recent call last):
  File "/home/brent/mambaforge/envs/qcsubmit-test-basic/lib/python3.11/site-packages/qcportal/record_models.py", line 437, in __del__
    self.sync_to_cache(True)  # Don't really *have* to detach, but why not
    ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/brent/mambaforge/envs/qcsubmit-test-basic/lib/python3.11/site-packages/qcportal/record_models.py", line 523, in sync_to_cache
    self._record_cache.writeback_record(self)
  File "/home/brent/mambaforge/envs/qcsubmit-test-basic/lib/python3.11/site-packages/qcportal/cache.py", line 169, in writeback_record
    self._conn.execute(stmt, row_data)
  File "src/cursor.c", line 169, in resetcursor
apsw.ReadOnlyError: ReadOnlyError: attempt to write a readonly database

To Reproduce
See either snippet above. I sent the first one to Jeff (@j-wags) and he could also reproduce it, so it's not just my machine at least.

Expected behavior
I expect both calls to get_records_with_cache to return the same record without throwing an exception. The first call should populate the cache and the second should read from it.

Additional context
This is on the most recent version of qcportal, 0.55. I can upload my full conda environment if needed.

@bennybp
Copy link
Contributor

bennybp commented Jul 8, 2024

I think this is fixed in the current main branch. Originally, records would be written to the cache when they get destructed/garbage collected. It's a cute idea, but results in a whole host of problems. So that was removed in PR #843, resulting in more expected behavior.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants