Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug report: PODP mode - ValueError in _resolve_genbank_accession() When No RefSeq Accession Exists #307

Open
liannette opened this issue Jan 30, 2025 · 0 comments

Comments

@liannette
Copy link
Contributor

liannette commented Jan 30, 2025

Bug Report

When resolving the GenBank accession of genome assemblies to the RefSeq accession using the _resolve_genbank_accession() function in podp_antismash_downloader.py, a ValueError occurs if no RefSeq accession exists for the given assembly. This leads to crashes when processing these assemblies.

Steps to Reproduce

  1. Use the NCBI Datasets API to fetch the RefSeq assembly ID for a given GenBank assembly ID.
  2. If a RefSeq accession is available, the function operates as expected. Example:
    {
      "assembly_revisions": [
        {
          "genbank_accession": "GCA_000175835.1",
          "refseq_accession": "GCF_000175835.1",
          "assembly_name": "ASM17583v1",
          "assembly_level": "contig",
          "release_date": "2009-12-15"
        }
      ],
      "total_count": 1
    }
  3. However, when the API response does not include a refseq_accession, the function fails. Example:
    {
      "assembly_revisions": [
        {
          "genbank_accession": "GCA_003326215.1",
          "assembly_name": "ASM332621v1",
          "assembly_level": "contig",
          "release_date": "2018-07-18",
          "sequencing_technology": "Illumina MiSeq"
        }
      ],
      "total_count": 1
    }
  4. This results in the following error:
    File ~/coding/NPLinker_workshop_2025/nplinker/src/nplinker/genomics/antismash/podp_antismash_downloader.py:284, in _resolve_genbank_accession(genbank_id)
        282     if resp.status_code == httpx.codes.OK:
        283         data = resp.json()
    --> 284         latest_entry = max(
        285             (entry for entry in data["assembly_revisions"] if "refseq_accession" in entry),
        286             key=lambda x: x["release_date"],
        287         )
        288         refseq_id = latest_entry["refseq_accession"]
        289 except httpx.ReadTimeout:
    
    ValueError: max() arg is an empty sequence
    

Suggested Fix
Returning an empty string when no RefSeq accession is found.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Backlog
Development

No branches or pull requests

1 participant