[Review recommendation] Integrate universal sample IDs provided by the BioSamples database #78

nevrome · 2024-06-21T12:16:52Z

This recommendation was raised in the review of the Poseidon paper.

The BioSamples database (https://www.ebi.ac.uk/biosamples/) is an attempt to provide universal sample IDs across the life sciences and is used by the archives for sequence reads (ENA/SRA/DDBJ). Essentially every published ancient sample already has a BioSample accession, because this is required for the submission of sequence reads to ENA/SRA/DDBJ. It would thus have seemed natural to make BioSamples IDs a central component of Poseidon metadata, so as to anchor Poseidon to the mainstream infrastructure, but this is not really done. There are some links being made to ENA in the .ssf "sequence source" files used by the Poseidon package, including sample accessions, but this seems more ad-hoc.

nevrome · 2024-06-21T12:46:53Z

The biosamples FAQ states

What pattern do BioSamples accessions follow?

BioSample accessions always begin with SAM. The next letter is either E or N or D depending if the sample information was originally submitted to EMBL-EBI or NCBI or DDBJ respectively. After that, there may be an A or a G to denote an Assay sample or a Group of samples. Finally there is a numeric component that may or may not be zero-padded.

This seems to match to the sample_accession field in the .ssf file, which identifies sequencing entities, not "samples" in the Poseidon sense. Is this correct? If we already have this covered in the .ssf file then maybe we should not add it to the .janno file as well.

stschiff · 2024-06-21T16:50:55Z

Yes, I think so too. Plus, we have Genetic_Source_Accession_IDs in the janno, which allows to specify the ENA sample ID as well.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Review recommendation] Integrate universal sample IDs provided by the BioSamples database #78

[Review recommendation] Integrate universal sample IDs provided by the BioSamples database #78

nevrome commented Jun 21, 2024

nevrome commented Jun 21, 2024

stschiff commented Jun 21, 2024

[Review recommendation] Integrate universal sample IDs provided by the BioSamples database #78

[Review recommendation] Integrate universal sample IDs provided by the BioSamples database #78

Comments

nevrome commented Jun 21, 2024

nevrome commented Jun 21, 2024

stschiff commented Jun 21, 2024