-
Notifications
You must be signed in to change notification settings - Fork 3
Home
Introductory slides from Carlo Pecoraro
Visit http://34.223.228.45/ws.html to access our AWS computational resources.
Day | Time | Activities |
---|---|---|
Monday, June 12 | morning | Workshop Introduction -slides- |
Exploring the Computational Infrastructure -slides- | ||
afternoon | Unix command-line review | |
Data overview and setup | ||
Using FASTQC and Trimmomatic -slides- | ||
Tuesday, June 13 | morning | Trinity de novo transcriptome assembly -slides- |
afternoon | Uploading own data or identifying and downloading SRA studies of interest -slides- | |
Wednesday, June 14 | morning | Expression quantification -slides- |
Quality assessment for assembly -slides- | ||
afternoon | QC samples and replicates | |
Thursday, June 15 | morning | Statistical methods for differential expression analysis -slides- |
afternoon | Transcript clustering and expression profiling | |
Methods for functional annotation -slides- | ||
Trinotate and TrinotateWeb | ||
Friday, June 16 | morning | Functional enrichment analysis |
Review and custom data analyses | ||
Comments on software installations for later use on different resources |
The command below will generate an 'interleaved' fastq file, where record 1 is followed immediately by record 2, and we'll extract the top 1M read pairs (which = 8M top lines due to the interleaving).
below, we retrieve SRA accession: SRR390728 and use the -X 5 parameter to simply limit the number of reads in this small example. When you run it with your sample of interest, use your SRR-value and do not use the -X 5 parameter.
% fastq-dump --defline-seq '@$sn[_$rn]/$ri' --split-files -X 5 -Z SRR390728 | \
head -n8000000 | gzip > SRR390728.interleaved.fastq.gz
Now, to de-interleave and generate the two separate fastq files for the 'left' and 'right' read mates, we can do the following:
% gunzip -c SRR390728.interleaved.fastq.gz | \
paste - - - - - - - - | \
tee >(cut -f 1-4 | tr '\t' '\n' | gzip > SRR390728_1.fastq.gz) | \
cut -f 5-8 | tr '\t' '\n' | gzip -c > SRR390728_2.fastq.gz
above is adapted from: https://biowize.wordpress.com/2015/03/26/the-fastest-darn-fastq-decoupling-procedure-i-ever-done-seen/
Just for fun, here's an example where the entire process is piped together (most complicated command I've every written in linux)
% fastq-dump --defline-seq '@$sn[_$rn]/$ri' --split-files -X 5 -Z SRR390728 | \
paste - - - - - - - - | \
head -n1000000 | \
tee >(cut -f 1-4 | tr '\t' '\n' | gzip > reads-1.fastq.gz) | \
cut -f 5-8 | tr '\t' '\n' | gzip -c > reads-2.fastq.gz
"Get that linux feeling ... on windows" cygwin: https://www.cygwin.com/