Monitoring SRA Samples on EBI search

The purpose of this code is to monitor the submissions of Norwegian sequencing data to domain repositories. This is achieved by querying the "sra-samples" endpoint of EBI search, containing metadata of the samples deposited in the Sequence Read Archive (SRA) which is also synchronised with the European Nucleotide Archive (ENA) as part of the International Nucleotide Sequence Data Collaboration (INSDC).

The code performs a query based on the country name Norway. Note that this identifies all the samples collected in Norway, not necessarily by Norwegian institutions or organisations. Extensive filtering is used to isolate the relevant data due to the lack of standardisation across the centres' names. The results are then plotted in two graphs, one for the BOTT (Bergen, Oslo, Trondheim, and Tromsø) universities and one for the Norwegian Institute of Public Health (NIPH) (Norwegian: Folkehelseinstituttet; FHI). A non-updated reference for these plots is made available in the /plots4reference folder.

The code has been tested with Python 3.10, 3.11, and 3.12. A YAML file for creating a conda environment (Python 3.12) is provided. This environment is generated by typing conda env create --file EBIsearch_environment.yml in a terminal.

This code was developed as part of a deliverable of BioMedData project. We acknowledge funding from the Research Council of Norway under code 295932.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Monitoring SRA Samples on EBI search

Files

README.md

Latest commit

History

README.md

File metadata and controls

Monitoring SRA Samples on EBI search