Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't fetch ena data with biome list #79

Open
amardeepranu opened this issue Dec 19, 2023 · 1 comment
Open

Can't fetch ena data with biome list #79

amardeepranu opened this issue Dec 19, 2023 · 1 comment

Comments

@amardeepranu
Copy link

https://github.com/EBI-Metagenomics/genomes-pipeline/blob/853487f6dda1420fd8b6b41dd4aff5c8540c7e37/bin/fetch_ena.py#L66

The method above returns nothing. I believe it's because metagenome_source returns empty results via the API:

https://www.ebi.ac.uk/ena/portal/api/search?result=wgs_set&query=assembly_type%3D%22metagenome-assembled%20genome%20%28mag%29%22&fields=study_accession%2Cmetagenome_source&limit=10&format=json&download=false

[
  {
    "study_accession": "PRJEB35770",
    "metagenome_source": "",
    "accession": "CAEMXZ010000000"
  },
  ....
  {
    "study_accession": "PRJEB35770",
    "metagenome_source": "",
    "accession": "CAESAJ010000000"
  }
]

Not sure what field would work here to get the biome, when hitting the api to get the search fields for wgs_set I get a 500: https://www.ebi.ac.uk/ena/portal/api/searchFields?dataPortal=metagenome&result=wgs_set&format=json - Is there an alternate field that contains the biome? Any workaround here? Thanks!

@tgurbich
Copy link
Contributor

Hi @amardeepranu,

Thanks for spotting and reporting this. The reason the function returns an empty result is because the ENA API changed the order of its columns recently and this script hasn't been adjusted to handle that. We will fix it in the new year.

The metagenome_source can be empty but it isn't always, you can see that if you run:
https://www.ebi.ac.uk/ena/portal/api/search?result=wgs_set&query=assembly_type%3D%22metagenome-assembled%20genome%20%28mag%29%22&fields=study_accession%2Cmetagenome_source&limit=10000&format=json&download=false

May I ask if are you working on generating a biome-specific catalogue yourself using public data? A workaround and perhaps an easier, more appropriate way of collecting genomes from a biome of interest would be to do a search in ENA/NCBI first and supply the scripts with genome accessions rather than a list of biomes. It would help to understand what you are trying to do to advise better.

Kind regards,
Tanya

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants