Skip to content

molgenis/NGS_Demultiplexing

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Demultiplexing pipeline

##1. Goal Next Generation Sequencing data processing using the inhouse pipeline for Bcl To FastQ conversion, demultiplexing and standardized filename convertion.

##2. Scope of application Demultiplexing pipeline for Illumina BaseCalls convertion to fastq files. The members of the GCC-NGS team are responsible for the analyses. This pipeline is used in combination with NGS_automated. The general workflow consist of the following steps:

####Data flow:

   ⎛¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯⎞
   ⎜                    Illumina sequencers writes Bcl data to GATTACA {01,02}machines     ⎜
   ⎜                                                                                       ⎜
   ⎝______________________________________________________________________________________⎠
                                         v
                                         v  > > > > > > NGS_Automated Demultiplexing [automatically start Demultplexing Pipeline when new bcl files and samplesheet are available ]
                                         v
   ⎛¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯⎞
   ⎜                    NGS_Demultiplexing conversion of Bcls files to Fastq files,        ⎜
   ⎜                    and takes place on GATTACA {01,02}machines.                        ⎜
   ⎝______________________________________________________________________________________⎠
                                         v
                                         v  > > > > > > NGS_Automated CopyRawDataToPRM [stores .fq.gz and .fq.gz.md5 files on permanent storage system]
                                         v                                           
   ⎛¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯⎞
   ⎜                  Fastqs are available for futher pressing by NGS_DNA or NGS_RNA       ⎜
   ⎜                  pipelines.                                                           ⎜
   ⎝______________________________________________________________________________________⎠

3. Description of the different pipeline steps.

Step 1: ProcessInterop

Stores and formats the clusterDensity, clustersPassingFilter, InterOp dir and Q30 QC values.

Scriptname: ProcessInterop
Input: InterOp dir
Output: Info/SequenceRun.csv file with clusterDensity, clustersPassingFilter and Q30.

Step 2: BclToFastQ

The Bcl files produced by the Illumina sequencers (MiSeq,NextSeq etc), needs to be converted to a readable format in the form of a FastQ file.

Scriptname: BclToFastQ
Input: sequencer output (bcl files)
Output: Illumina FastQ files (lane${lane}_${barcode_combined[sampleNumber]}_S[0-9]*_L00${lane}_R1_001.fastq.gz)

Step 3: Illumina2GafFastQ

The Illumina FastQ files have to be renamed to a format that can be used by the downstream pipeline

Scriptname: Illumina2GafFastQ
Input: Illumina FastQ files (lane${lane}_${barcode_combined[sampleNumber]}_S[0-9]L00${lane}R1_001.fastq.gz)
Output: (${filePrefix}
${lane}
${barcode}.fastq.gz)

Step 4: Demultiplex

In this step the reads with the known barcodes will be counted and will be written to a log file per lane.

Scriptname: Demultiplex
Input: (${filePrefix}${lane}${barcode}.fastq.gz)*
Output: ${filePrefix}_${lane}.log

Step 5: UploadSampleSheet

Samplesheet will be copied to the track and trace server (molgenis server).

**Scriptname:**UploadSampleSheet

4. Preparing and running a !manually started NGS_Demultiplexing run.

To run a demultiplexing pipeline you need to have a samplesheet with the same name as the sequence run(e.g. STARTDATE_SEQ_RUNNR_FLOWCELLXX.csv)

SCR_ROOT_DIR=${root}/groups/${groupname}/${tmpDir}/
mkdir ${SCR_ROOT_DIRpDir}/generatedscripts/STARTDATE_SEQ_RUNNR_FLOWCELLXX
SCR_ROOT_DIR=${root}/groups/${groupname}/${tmpDir}/

scp –r STARTDATE_SEQ_RUNNR_FLOWCELLXX username@yourcluster:/groups/${groupname}/${tmpDir}/generatedscripts/

module load NGS_Demultiplexing

cd ${root}/groups/${groupname}/${tmpDir}/generatedscripts/STARTDATE_SEQ_RUNN_FLOWCELLXX
cp ${EBROOTNGS_Demultiplexing}/generate_template.sh .
bash generate_template.sh "${project}" "${SCR_ROOT_DIR}" "${group}"

Navigate to jobs folder (this will be outputted at the step before this one). And than submit the jobs.

bash submit.sh