All the functions here are thread-safe and can also be invoked from multiple parallel processes using the same
+
+CACHE_DIR
+
+. This directory can even be shared between multiple users as long as they have read-write permissions to the shared directory. This should even work on NFS mounted volumes shared between multiple servers.
+
The default (
+$HOME/.cache/gmara
+) location of the cache of downloaded Gmara data files.
+
+
You can override this by setting the
+METACELLS_GMARA_CACHE_DIR
+ environment variable, or by passing an explicit
+cache_dir
+ parameter to the functions.
+
+
The top-level under this is the version indicator, where
+main
+ is always the latest and greatest version. Under each version we store the files in the same path as in github, with a
+.gz
+ suffix for the compressed raw data,
+.jl_set.gz
+ for serialized Julia set objects, and
+.lock
+ for temporary lock files for coordinating between parallel processes.
+
The default timeout in seconds (10) for waiting for a lock file in the Gmara cache. If not positive, will wait forever. If a process crashes very badly then a lock file may be left behind and may need to be removed by hand to allow access for the data.
+
+
You can override this by setting the
+METACELLS_GMARA_TIMEOUT
+ environment variable, or by passing an explicit
+timeout
+ parameter to the functions.
+
Normalize the a gene name in some namespace. In most namespaces, this means removing the
+.[0-9]
+ version suffix from the name, and converting the name to upper case. To lookup a name in a list or a namespace, you need to normalize the query gene name accordingly. The UCSC namespace is an exception in that it is all-lower-case and the
+.[0-9]
+ suffix seems to be an inherent part of the identifier.
+
Return the set of names in a namespace of genes of some species. As usual in Gmara, this includes everything that may be used as name, e.g. for Ensembl it includes genes, transcripts and proteins; for Symbol it includes genes and clones; etc.
+
Return the set of names in list in a namespace of genes of some species. This returns all the names that are (probably) in the list; it a name isn't in the result, it is almost certain it does not belong in the list. As usual in Gmara, this includes everything that may be used as name, e.g. for Ensembl it includes genes, transcripts and proteins; for Symbol it includes genes and clones; etc.
+
All requests are cached in-memory. This makes repeated requests cheap. This consumes some (modest amount of) memory; also, if the data in the server has been updated (which rarely happens), you will keep getting the old result. This function releases all the memory and forces all subsequent requests to query the server. In the common case the server tells us our disk cache data is up to date, we don't re-download it).
+