Skip to content

Latest commit

 

History

History
10 lines (6 loc) · 1.63 KB

README.md

File metadata and controls

10 lines (6 loc) · 1.63 KB

Monitoring SRA Samples on EBI search

The purpose of this code is to monitor the submissions of Norwegian sequencing data to domain repositories. This is achieved by querying the "sra-samples" endpoint of EBI search, containing metadata of the samples deposited in the Sequence Read Archive (SRA) which is also synchronised with the European Nucleotide Archive (ENA) as part of the International Nucleotide Sequence Data Collaboration (INSDC).

The code performs a query based on the country name Norway. Note that this identifies all the samples collected in Norway, not necessarily by Norwegian institutions or organisations. Extensive filtering is used to isolate the relevant data due to the lack of standardisation across the centres' names. The results are then plotted in two graphs, one for the BOTT (Bergen, Oslo, Trondheim, and Tromsø) universities and one for the Norwegian Institute of Public Health (NIPH) (Norwegian: Folkehelseinstituttet; FHI). A non-updated reference for these plots is made available in the /plots4reference folder.

The code has been tested with Python 3.10, 3.11, and 3.12. A YAML file for creating a conda environment (Python 3.12) is provided. This environment is generated by typing conda env create --file EBIsearch_environment.yml in a terminal.

This code was developed as part of a deliverable of BioMedData project. We acknowledge funding from the Research Council of Norway under code 295932.