Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Review recommendation] Integrate universal sample IDs provided by the BioSamples database #78

Open
nevrome opened this issue Jun 21, 2024 · 2 comments

Comments

@nevrome
Copy link
Member

nevrome commented Jun 21, 2024

This recommendation was raised in the review of the Poseidon paper.

The BioSamples database (https://www.ebi.ac.uk/biosamples/) is an attempt to provide universal sample IDs across the life sciences and is used by the archives for sequence reads (ENA/SRA/DDBJ). Essentially every published ancient sample already has a BioSample accession, because this is required for the submission of sequence reads to ENA/SRA/DDBJ. It would thus have seemed natural to make BioSamples IDs a central component of Poseidon metadata, so as to anchor Poseidon to the mainstream infrastructure, but this is not really done. There are some links being made to ENA in the .ssf "sequence source" files used by the Poseidon package, including sample accessions, but this seems more ad-hoc.

@nevrome
Copy link
Member Author

nevrome commented Jun 21, 2024

The biosamples FAQ states

What pattern do BioSamples accessions follow?

BioSample accessions always begin with SAM. The next letter is either E or N or D depending if the sample information was originally submitted to EMBL-EBI or NCBI or DDBJ respectively. After that, there may be an A or a G to denote an Assay sample or a Group of samples. Finally there is a numeric component that may or may not be zero-padded.

This seems to match to the sample_accession field in the .ssf file, which identifies sequencing entities, not "samples" in the Poseidon sense. Is this correct? If we already have this covered in the .ssf file then maybe we should not add it to the .janno file as well.

@stschiff
Copy link
Member

Yes, I think so too. Plus, we have Genetic_Source_Accession_IDs in the janno, which allows to specify the ENA sample ID as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants