Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing ensembl versions #16

Open
emdann opened this issue Apr 26, 2023 · 4 comments
Open

Missing ensembl versions #16

emdann opened this issue Apr 26, 2023 · 4 comments
Labels
bug Something isn't working P1☕️ Medium priority

Comments

@emdann
Copy link
Member

emdann commented Apr 26, 2023

Report

The default Ensembl version in ensembldb is v86, but this doesn't seem to exist in the sql database.

import pooch

PKG_CACHE_DIR = "genomic-annotations"
url_template = "https://bioconductorhubs.blob.core.windows.net/annotationhub/AHEnsDbs/v{version}/EnsDb.{species}.v{version}.sqlite"

local_path = pooch.retrieve(
        url = url_template.format(version="86", species="Hsapiens"),
        known_hash=None,
        path=pooch.os_cache(PKG_CACHE_DIR),
        progressbar=True,
    )
Downloading data from 'https://bioconductorhubs.blob.core.windows.net/annotationhub/AHEnsDbs/v86/EnsDb.Hsapiens.v86.sqlite' to file '/home/jovyan/.cache/genomic-annotations/1c849fa5b54f90367dbde8fd3f560e9a-EnsDb.Hsapiens.v86.sqlite'.
---------------------------------------------------------------------------
HTTPError                                 Traceback (most recent call last)
Input In [120], in <cell line: 4>()
      1 import pooch
      2 url_template = "[https://bioconductorhubs.blob.core.windows.net/annotationhub/AHEnsDbs/v](https://bioconductorhubs.blob.core.windows.net/annotationhub/AHEnsDbs/v%3C/span%3E%3Cspan) class="ansi-bold" style="color:rgb(175,95,135)">{version}/EnsDb.{species}.v{version}.sqlite"
----> 4 local_path = pooch.retrieve(
      5         url = url_template.format(version="86", species="Hsapiens"),
      6         known_hash=None,
      7         path=pooch.os_cache(PKG_CACHE_DIR),
      8         progressbar=True,
      9     )

File ~/my-conda-envs/genomic-features/lib/python3.9/site-packages/pooch/core.py:239, in retrieve(url, known_hash, fname, path, processor, downloader, progressbar)
    236 if downloader is None:
    237     downloader = choose_downloader(url, progressbar=progressbar)
--> 239 stream_download(url, full_path, known_hash, downloader, pooch=None)
    241 if known_hash is None:
    242     get_logger().info(
    243         "SHA256 hash of downloaded file: %s\n"
    244         "Use this value as the 'known_hash' argument of 'pooch.retrieve'"
   (...)
    247         file_hash(str(full_path)),
    248     )

File ~/my-conda-envs/genomic-features/lib/python3.9/site-packages/pooch/core.py:803, in stream_download(url, fname, known_hash, downloader, pooch, retry_if_failed)
    799 try:
    800     # Stream the file to a temporary so that we can safely check its
    801     # hash before overwriting the original.
    802     with temporary_file(path=str(fname.parent)) as tmp:
--> 803         downloader(url, tmp, pooch)
    804         hash_matches(tmp, known_hash, strict=True, source=str(fname.name))
    805         shutil.move(tmp, str(fname))

File ~/my-conda-envs/genomic-features/lib/python3.9/site-packages/pooch/downloaders.py:207, in HTTPDownloader.__call__(self, url, output_file, pooch, check_only)
    205 try:
    206     response = requests.get(url, **kwargs)
--> 207     response.raise_for_status()
    208     content = response.iter_content(chunk_size=self.chunk_size)
    209     total = int(response.headers.get("content-length", 0))

File ~/my-conda-envs/genomic-features/lib/python3.9/site-packages/requests/models.py:1021, in Response.raise_for_status(self)
   1016     http_error_msg = (
   1017         f"{self.status_code} Server Error: {reason} for url: {self.url}"
   1018     )
   1020 if http_error_msg:
-> 1021     raise HTTPError(http_error_msg, response=self)

HTTPError: 404 Client Error: The specified blob does not exist. for url: https://bioconductorhubs.blob.core.windows.net/annotationhub/AHEnsDbs/v86/EnsDb.Hsapiens.v86.sqlite

The EnsemblDB class should have a method to check (and return?) available versions

Version information


bioframe 0.4.1
genomic_annotations 0.0.1
ibis 5.1.0
pandas 2.0.1
pooch v1.7.0
session_info 1.0.0

PIL 9.5.0
asttokens NA
backcall 0.2.0
bidict 0.22.1
certifi 2022.12.07
charset_normalizer 3.1.0
cycler 0.10.0
cython_runtime NA
dateutil 2.8.2
debugpy 1.6.7
decorator 5.1.1
executing 1.2.0
greenlet 2.0.2
idna 3.4
importlib_metadata NA
importlib_resources NA
ipykernel 6.14.0
jedi 0.18.2
kiwisolver 1.4.4
matplotlib 3.7.1
mpl_toolkits NA
multipledispatch 0.6.0
numpy 1.24.3
packaging 23.1
parso 0.8.3
parsy 2.1
pexpect 4.8.0
pickleshare 0.7.5
pkg_resources NA
platformdirs 3.3.0
prompt_toolkit 3.0.38
psutil 5.9.5
ptyprocess 0.7.0
public 3.1.1
pure_eval 0.2.2
pyarrow 11.0.0
pydev_ipython NA
pydevconsole NA
pydevd 2.9.5
pydevd_file_utils NA
pydevd_plugins NA
pydevd_tracing NA
pygments 2.15.1
pyparsing 3.0.9
pytz 2023.3
regex 2.5.125
requests 2.28.2
rich NA
six 1.16.0
sqlalchemy 2.0.10
sqlglot 11.5.7
stack_data 0.6.2
toolz 0.12.0
tornado 6.3
tqdm 4.65.0
traitlets 5.9.0
typing_extensions NA
urllib3 1.26.15
wcwidth 0.2.6
xxhash NA
zipp NA
zmq 25.0.2
zoneinfo NA

IPython 8.4.0
jupyter_client 8.2.0
jupyter_core 5.3.0

Python 3.9.16 | packaged by conda-forge | (main, Feb 1 2023, 21:39:03) [GCC 11.3.0]
Linux-4.15.0-112-generic-x86_64-with-glibc2.31

Session information updated at 2023-04-26 16:00

@emdann emdann added the bug Something isn't working label Apr 26, 2023
@ivirshup
Copy link
Member

This specific version was not uploaded to annotationhub, which may apply to other versions as well. Some possible solutions:

  • Have a list of versions that are on annotation hub, throw an informative error if the requested one isn't on there
  • The sqlite db does exist in the bioconductor package: EnsDb.Hsapiens.v86, we could download that and extract it as a fallback
  • We could try and get the sqlite db uploaded to annotation hub

@jorainer
Copy link

jorainer commented May 8, 2023

AnnotationHub provides EnsDb sqlite databases from Ensembl release 87 on. It would be possible to create/add also older versions, but (to not increase storage demand of the AnnotationHub too much) I would only do that for selected versions - and if there is need.

@ivirshup
Copy link
Member

ivirshup commented May 8, 2023

I guess I don't have a specific use-case ATM for accessing other older versions (86 came up since it's used in the ensembldb vignette). Do you have some idea of how often the older versions are used?

It would be nice to have parity in access from python and R.

but (to not increase storage demand of the AnnotationHub too much)

This might intersect with a conversation I was just having with @lshep on the bioc slack about compression of these files. Is this something you've considered for the sqlite databases?

@jorainer
Copy link

jorainer commented May 9, 2023

honestly - I don't know which releases are predominantly used - maybe there is a "usage/download" log for AnnotationHub (pinging @lshep ).

For compressing the sqlite files - AFAIK R can not read from gzipped SQLite files, so the files (if compressed) would need to be unzipped locally first (could be something that AnnotationHub could actually also do on the fly?).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working P1☕️ Medium priority
Projects
None yet
Development

No branches or pull requests

3 participants