-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add cache management features #799
Conversation
We add a cache management layer on top of Pystow. This takes the form of two classes (both in `oaklib.utilities.caching`): * one representing the cache management policy, i.e. the logic dictating whether a cached file (if present) should be refreshed or not; * one representing the file cache itself. The policy is set once by the main entry point method, using either a default policy of refreshing cached data after 7 days, or another policy explicitly selected by the user with the new `--caching` option. The class that represents the file cache is the one that the rest of OAK should interact with whenever an access to caching data is needed. Ultimately, all calls to the Pystow module should be replaced to calls to FileCache, the use of Pystow becoming an implementation detail entirely encapsulated in FileCache.
Add new methods to the FileCache class to (1) get the list of files present in the cache and (2) delete files in the cache. Replace the implementations of the cache-ls and cache-clear commands to use the new methods, so that the details of cache listing and clearing remain encapsulated in FileCache. As a side-effect, this automatically fixes the issue that cache listing was only working on Unix-like systems, since the FileCache implementation is pure Python and does not rely on the ls(1) Unix command.
The intended difference between the REFRESH and RESET caching policies is that, when a cache lookup is attempted, REFRESH should cause the file that was looked up -- and only that file -- to be refreshed, leaving any other file that may be present in the cache untouched. RESET, on the other hand, should entirely clear the cache, so that not only the file that was looked up should be refreshed, but any other file that may looked up in a subsequent call should be refreshed as well. This commit implements the intended behaviour for the RESET policy.
In principle, we should never have to compare a timestamp representing a future date when we check whether a cached file should be refreshed. However, files with bogus mtime values and/or computers configured with a bogus system time are certainly not uncommon, so encountering a timestamp higher than the current time can (and will) definitely happen. Under an "always refresh" policy, a refresh must be triggered even if the cached file appears to "newer than now", so we explicitly implement that behaviour here. We also add a complete test fixture for the CachePolicy class.
In the SQLite tutorial, in the section that briefly mentions that automatically downloaded SQLite files are cached in ``.data/oaklib``, we describe in more details how the cache works and how it can be controlled using the `--caching` option.
Codecov ReportAttention: Patch coverage is
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files@@ Coverage Diff @@
## main #799 +/- ##
==========================================
+ Coverage 74.01% 74.16% +0.14%
==========================================
Files 282 284 +2
Lines 33533 33742 +209
==========================================
+ Hits 24821 25026 +205
- Misses 8712 8716 +4 ☔ View full report in Codecov by Sentry. |
This is so awesome, thank you! |
Add a new section in the CLI reference documentation to explain how the cache works and how it can be controlled using the `--caching` option. Replace the previous, shorter documentation in the SQLite tutorial by a simple mention of the cache with a link to the newly added reference section.
Thanks @gouttegd , this is going to help to avoid a lot of confusion in the future. |
This commit adds the possibility to configure the file cache to apply pattern-specific caching policies. This is controlled by a configuration file ($XDG_CONFIG_HOME/ontology-access-kit/cache.conf, under GNU/Linux) containing "pattern=policy" pairs, where pattern is a shell-type globbing pattern and policy is a string of the same type as expected by the newly introduced --caching option.
PR updated with the feature discussed in this comment. Briefly, in addition to the
|
The "user_config_dir" returned by the Appdirs package under macOS is not in "~/Library/Prefences" but under "~/Library/Application Support" (Appdirs documentation is not up to date). Also, there is no need to mention the roaming directory under Windows, as Appdirs will never use that directory unless we explicitly asks it do so (which we don't). There is also no need for a show_default=True parameter with the --caching option, since that option has _no_ default.
This PR implements roughly what was proposed in this comment.
From a user’s perspective:
By default, whenever an attempt is made to access a SQLite DB through the
sqlite:obo:...
descriptor, and the requested DB is already present in the Pystow cache, OAK will check whether the SQLite file is older than 7 days, and if it is, will forcefully re-download that file again.A new command-line global option
--caching
is available to alter that default behaviour. That option allows to:--caching=3w
, a cached DB will be refreshed upon access if is older than 3 weeks (the general syntax isND
, where N is a number andD
can bes
,d
,w
,m
, ory
to indicate that N is a number of seconds, days, weeks, months, or years, respectively).--caching=no-refresh
.--caching=refresh
.--caching=clear
or--caching=reset
.Implementation details:
Most of the new code is in the new module
oaklib.utilities.caching
and split in two classes:CachePolicy
represents the logic to determine whether a given file is in need of refreshing.FileCache
represents the file cache itself and is the main interface that the rest of OAK should interact with whenever they need to access the cache (instead of using Pystow directly). It is merely a thin layer on top of Pystow, that tries¹ to present the same interface (same methods) than a Pystow module, so that it can be used as a drop-in replacement.(¹ I say “tries to” because it only implements the Pystow methods that were actually used somewhere in OAK code, and only with the parameters used in those case.)
A new
FILE_CACHE
object (an instance ofFileCache
) is added to the globals inoaklib.constants
(same place containing thePYSTOW_MODULE
), where it can be used by any part of OAK that needs to interact with the cache. Thellm_implementation
andsqldb_implementation
modules are amended to use that new global instead ofPYSTOW_MODULE
.Finally, in the main entry point module:
--caching
option is added;cache-ls
andcache-clear
commands are re-written to use methods from theFileCache
class.closes #792