Skip to content

Preparing the input files

Charlotte Soneson edited this page Oct 14, 2018 · 10 revisions

Two types of input files are required for running the RNA-seq workflow: the (compressed) FASTQ files containing the sequencing reads, and a metadata text file containing any information about the samples.

The Snakefile assumes that the FASTQ files are named according to the pattern <sample-name>.fastq.gz (or <sample-name>_R1.fastq.gz and <sample-name>_R2.fastq.gz for paired-end data). If this is not the case you need to rename the files or modify the Snakefile accordingly.

The metadata file should be a tab-separated text file, with at least two columns: one named names, which contains all the values of <sample-name> from the fastq files, and one named type which is either SE or PE depending on whether the samples were obtained with a single-end or paired-end protocol. In addition, any number of columns can be included and used later in the analysis. All variables required for the differential expression analysis should be included as columns in the metadata text file. An example of a metadata text file can be seen here.