Island Sampling Day (ISD)

The Island Sampling Day draws inspiration from the Ocean Sampling Day that is a consortium that facilitates yearly samplings in stations from all around the world. This has great applications in monitoring and comprehending the complexity of the ocean microbiome and it's drivers.

The Island Sampling Day has been organised in two islands, Moorea in the French Polynisia and Crete in Greece. The Moorea sampling day event took place in March 2022 (in the context of GSC22. In Crete, two sampling events have taken place: the first one in June 2016 (in the context of GSC18), and the second one in July 2022. For both sampling events more information can be cound here.

Most of the content that follows relates to the Crete 2016 Island Sampling Day.

Soil microbiome sampling

The metagenomic revolution enabled environmental microbial biodiversity studies which for the past 20 years, have sampled multiple local and large scale regions (e.g. France microbiome, Earth Microbiome). These studies indicated that microbes are omnipresent and impact global ecosystem processes through their abundance, versatility and biotic interactions (i.e. interactomes). Additionally, microbes influence the major nutrient cycles and ecosystem functions, such as plant and animal productivity. Changes in the structure of microbial communities can be indicative of ecosystem tipping points e.g aridification.

Chapter one: Crete 2016

ISD is a soil microbiome study to explore environmental drivers of community composition across ecosystems in the island of Crete, Greece.

A snapshot of the prokaryotic diversity across the island of Crete. Aim to:

Put metadata into action through a citizen science project to demonstrate their value in future samplings
Describe the first full island microbial community assessment
Shed light on phylotypes found not to be all the same across Crete’s distinct terrestrial habitats (intra comparison)
Present a “local” (island) vs “global” (world, e.g. EMP) soil phylotype comparative analysis (inter comparison)

This repository holds ISD molecular ecology analysis pertinent scripts. ISD sequence data are available as a European Nucleotide Archive repository project.

Methods

Sampling
MTA and soil permits
DNA extraction and amplification
Sequencing
Soil physical chemistry
Bioinformatics and Ecological analysis

Sampling

During the 18th of Genome Standards Consortium workshop the participants along with members of Hellenic Centre for Marine Research organised the Island Sampling Day Crete 2016.

One Day, One Island - 15th June 2016, Crete

coordinated island wide Citizen Science activity
26 participants, 10 teams/routes
435 soil samples, 140 plant specimens
72 predefined locations across the island of Crete
different types of environments, elevations, vegetation and soil
a range of environmental and temporal variables recorded on site

See here for the video of the ISD 2016 project

Scripts

The scripts of the analysis are in the Scripts folder and cover the following tasks:

Search ENA for samples in the island of Crete
Get ISD metadata and sequences
HPC jobs and parameter files
Filtering, clustering/denoising and taxonomic assigninments
Biodiversity analysis
Figures

The bioinformatic workflow can be summarized to : GET, INFER and ANALYZE as shown below:

Data retrieval

Metadata

Retrieve the filereport of ENA for the ISD Crete project id PRJEB21776

curl -o Data/Metadata/filereport_read_run_PRJEB21776_tsv.txt \
    'https://www.ebi.ac.uk/ena/portal/api/filereport?accession=PRJEB21776&result=read_run&fields=study_accession,sample_accession,experiment_accession,run_accession,tax_id,scientific_name,fastq_ftp,submitted_ftp,sra_ftp,bam_ftp&format=tsv'

Then for each sample retrieve all attributes with the get_isd_crete_2016_attributes.py script. These are all in xml format. Using the ena_xml_to_csv.py these files are aggregated to an long format tsv file.

Structure

Each xml file, i.e. each sample, has a total of 43 attributes which are

source material identifiers
organism
total nitrogen method
ENA-SUBMISSION
geographic location (region and locality)
environment (biome)
total organic C method
ENA-SUBMITTED-FILES
ENA-EXPERIMENT
ENA-FASTQ-FILES
target subfragment
common name
total organic carbon
DNA concentration
vegetation zone
total nitrogen
soil environmental package
environment (material)
amount or size of sample collected
geographic location (depth)
environment (feature)
ENA-LAST-UPDATE
sample collection device or method
place name
ENA-FIRST-PUBLIC
project name
collection date
target gene
ENA-CHECKLIST
geographic location (country and/or sea)
ENA-STUDY
pcr primers
sequencing method
water content method
storage conditions (fresh/frozen/other)
water content
investigation type
geographic location (elevation)
sample volume or weight for DNA extraction
geographic location (latitude)
geographic location (longitude)
current land use (emp 500 soil)
ENA-RUN

Sequences

To retrieve the sequences use the get_isd_crete_2016_fastq.sh script after downloading the metadata.

Total reads (forward and reverse) = 121232490 in 140 samples

Primers used are FWD: 5'-ACTCCTACGGGAGGCAGCAG-3' REV: 5'-GGACTACHVGGGTWTCTAAT-3'

Numbers of Ns in reads : there are many reads with Ns and the dada function doesn't accept so after the filtering they are all removed.

Data integration

Crete has been sampled multiple times for its' environmental microbiome. The ISD project is the first one on this scale.

To investigate which samples are in Crete and are related to microbiome we did an analysis.

get all samples metadata from ENA
filter locations to be in Crete, island
filter the metagenomic samples that are also terrestrial and visualise

Inference and Taxonomy

We used PEMA and DADA2 for the clustering of OTUs and ASVs, respectively.

PEMA

PEMA 2.1.5 incorporates state of the art tools for each step of the analysis.

Filtering and quality - fastp OTU clastering - USEARCH Taxonomy assignement - Silva v132

PEMA run in 21 hours using BATCH node, Intel(R) Xeon(R) CPU 2.60GHz, 20 threads, 126 gb RAM memory.

OS = Debian 4.19.146-1

DADA2

DADA2 for our dataset required a total of 1121 minutes (18 hours, 41 minutes) to run on a FAT node of the ZORBAS HPC - IMBBC - HCMR.

FAT node Specs : Intel(R) Xeon(R) Gold 6230 CPU @ 2.10GHz, 40 CPUs, 503gb RAM OS = Debian 4.19.146-1

Biodiversity

Biodiversity analysis includes the following topics:

Maps of Crete with samples and ENA all samples in Crete
OTUs with Silva 132 and ASVs with Silva 138 comparison of results
Taxonomic depth
Representative phyla ratios differ in samples
Taxonomic profiling
Community matrices
Metadata as drivers of diversity indices
Ordination
Same site locations comparison – two sampling sites, at the same sampling location - Bray Curtis dissimilarity – supports treating the samples separately

Hardware

Most computations were performed in the Zorbas HPC facility of IMBBC-HCMR, see here for more info.

Software

IMBBC HPC

IMBBC HPC facility is based on SLURM, an example file of job submission is included in Scripts folder.

Conda environment

In the IMBBC HPC facility we created conda environments for reproducibility and flexibility of the tools installations.

Unix tools

GAWK 5.3.0
PEMA 2.1.5
Conda/Bioconda 23.5.2

R

We used R version 4.3.2 and the following packages:

DADA2 1.28.0
vegan 2.6-4
ape 5.7-1
tidyverse 2.0.0
sf 1.0-14

And for visualisation the extra packages:

UCIE 1.0.2
pheatmap 1.0.12
scales 1.3.0
ggpubr 0.6.0

Python

We used Python 3.11 and the following modules/libraries:

xml.etree.ElementTree 1.3.0
csv 1.0
random
requests 2.31.0
pandas 2.1.1
numpy 1.24.4
umap-learn 0.5.4

Citation

ISD Crete consortium

Licence

GNU GPLv3 license (for 3rd party scripts separate licenses apply).

Name		Name	Last commit message	Last commit date
Latest commit History 81 Commits
Data		Data
Methods		Methods
Scripts		Scripts
.gitignore		.gitignore
README.md		README.md
isd_crete_flowchart.png		isd_crete_flowchart.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Island Sampling Day (ISD)

Soil microbiome sampling

Chapter one: Crete 2016

Contents

Methods

Sampling

Scripts

Data retrieval

Metadata

Structure

Sequences

Data integration

Inference and Taxonomy

PEMA

DADA2

Biodiversity

Hardware

Software

IMBBC HPC

Conda environment

Unix tools

R

Python

Citation

Licence

About

Releases

Packages

Contributors 4

Languages

GenomicsStandardsConsortium/ISD

Folders and files

Latest commit

History

Repository files navigation

Island Sampling Day (ISD)

Soil microbiome sampling

Chapter one: Crete 2016

Contents

Methods

Sampling

Scripts

Data retrieval

Metadata

Structure

Sequences

Data integration

Inference and Taxonomy

PEMA

DADA2

Biodiversity

Hardware

Software

IMBBC HPC

Conda environment

Unix tools

R

Python

Citation

Licence

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages