updated README for 0.4.2 release [ci skip]

kapsakcj · Apr 21, 2020 · feb1b4a · feb1b4a
1 parent 9b99a8e
commit feb1b4a
Showing 1 changed file with 53 additions and 41 deletions.
diff --git a/README.md b/README.md
@@ -63,45 +63,47 @@ For `workflow.sh` the first positional parameter must be the project folder.  Bo
     3. `-b y || yes || n || no`      - were barcodes used?
     4. `-f r941 || r10`              - flowcell type used?
     5. `-k rapid || ligation`         - sequencing kit used?
-  * copies fast5s from input directory to `/scratch/$USER/guppy.gpu.XXXXXX`
-  * runs `guppy_basecaller` in `hac` mode on GPUs
+  * copies fast5s from input directory to `/tmp/$USER/guppy.gpu.XXXXXX`
+  * runs `guppy_basecaller` in `hac` or high-accuracy mode on GPUs
   * Demultiplexes using `guppy_basecaller` and additionally trims adapter and barcode sequences (using `--trim_barcodes ; --barcode_kits "EXP-NBD103" or "SQK-RBK004"` options)
   * Compresses (gzip) the demultiplexed reads (`--compress_fastq` option)
   * Copies demultiplexed, trimmed, compressed reads into subdirectories in `$OUTDIR/demux/barcodeXX`
 
  #### USAGE:
 ```bash
-# download the scripts, ignoring the test data stored via git-lfs
-$ GIT_LFS_SKIP_SMUDGE=1 git clone https://github.com/lskatz/nanoporeWorkflow.git 
+$ wget https://github.com/lskatz/nanoporeWorkflow/archive/v0.4.2.tar.gz 
+$ tar -xzf v0.4.2.tar.gz
 
-# TODO - download the repo from a release, once one is made
-
-# optionally add the workflows to your $PATH (edit the PATH below to wherever you cloned the repo)
-$ echo 'export PATH=$PATH:/path/to/nanoporeWorkflow/workflows' >> ~/.bashrc
+# optionally add the workflows to your $PATH (edit the PATH below to wherever you downloaded the repo)
+$ echo 'export PATH=$PATH:/path/to/nanoporeWorkflow-0.4.2/workflows' >> ~/.bashrc
 # refresh your environment
 $ source ~/.bashrc
 
-Usage: ../nanoporeWorkflow/workflows/run_01_basecall-w-gpu.sh
+Usage: /path/to/nanoporeWorkflow-0.4.2/workflows/run_01_basecall-w-gpu.sh
                  -i path/to/fast5files/        
                  -o path/to/outputDirectory/   
                  -b y || yes || n || no        barcodes used?
                  -f r941 || r10                flowcell type used?
                  -k rapid || ligation          sequencing kit used?
 
-example: ../nanoporeWorkflow/workflows/run_01_basecall-w-gpu.sh -i fast5s/ -o output/ -b y -f r941 -k rapid
+example: /path/to/nanoporeWorkflow-0.4.2/workflows/run_01_basecall-w-gpu.sh -i fast5s/ -o output/ -b y -f r941 -k rapid
 
-# OUTPUT
+# EXAMPLE OUTPUT
 $OUTDIR
 ├── demux
-│   ├── barcode06
-│   │   └── barcode06.fastq.gz (there will be many .fastq.gz files per barcode)
-│   ├── barcode10
-│   │   └── barcode10.fastq.gz
-│   ├── barcode12
-│   │   └── barcode12.fastq.gz
-│   ├── none.fastq.gz
-|   └── sequencing_summary.txt
-
+│   ├── barcode01
+│   │   └── fastq_runid_fbc8eee46271cbe60ee8a49d0ca657f6e92e174e_0_0.fastq.gz (there will be many .fastq.gz files per barcode)
+│   ├── barcode02
+│   │   └── fastq_runid_fbc8eee46271cbe60ee8a49d0ca657f6e92e174e_0_0.fastq.gz
+│   ├── barcode03
+│   │   └── fastq_runid_fbc8eee46271cbe60ee8a49d0ca657f6e92e174e_0_0.fastq.gz
+│   ├── guppy_basecaller_log-2020-04-17_09-45-00.log
+│   ├── sequencing_summary.txt
+│   ├── sequencing_telemetry.js
+│   └── unclassified
+│       └── fastq_runid_fbc8eee46271cbe60ee8a49d0ca657f6e92e174e_0_0.fastq.gz
+└── log
+    └── guppy-gpu.o7987624
 ```
 
 ### Assembly with Flye and polishing with Racon and Medaka
@@ -117,7 +119,7 @@ $OUTDIR
 
 #### Requirements
   * Must have previously run the above script that basecalls reads on a GPU node.
-  * Not necessary to be on node98. Any server with the ability to `qsub` will work.
+  * Must be logged into a server with the ability to `qsub` (Aspen, Monoliths).
   * `outdir` argument must be the same directory as the `OUTDIR` from the gpu-basecalling script
     * Recommend `cd`'ing to one directory above and use `.` as the `outdir` argument (see USAGE below)
 
@@ -127,22 +129,27 @@ Usage:
     # example - if you are one directory above the output directory from the gpu-basecalling script
     ~/nanoporeWorkflow/workflows/workflow-after-gpu-basecalling.sh outdir/
 
-# OUTPUT - only showing one barcode for brevity, not all files included
-$OUTDIR
-demux/
-├── barcode07
-│   ├── all.fastq.gz
-│   ├── flye
-│   ├── medaka
-│   ├── racon
-│   ├── readlengths.txt.gz
-│   └── reads.minlen1000.600Mb.fastq.gz
+# EXAMPLE OUTPUT - only showing one barcode for brevity
+$OUTDIR/
+├── demux
+│   ├── barcode01
+│   │   ├── all.fastq.gz
+│   │   ├── flye
+│   │   ├── log  # qsub logs
+│   │   │   ├── assemble-d64ffbc5-4012-44c5-8191-1a57d4a7d15c.log
+│   │   │   ├── polish-medaka-00e52c16-0bd3-460d-b955-3a532be958b1.log
+│   │   │   ├── polish-racon-d7ebc124-d100-43e0-b347-1e60bbc0bf18.log
+│   │   │   └── prepSample-7ecc6f51-4937-40d1-a6bd-d83e66078984.log
+│   │   ├── medaka
+│   │   ├── racon
+│   │   ├── readlengths.txt.gz
+│   │   └── reads.minlen1000.600Mb.fastq.gz
+│   ├── guppy_basecaller_log-2020-04-17_09-45-00.log
+│   ├── sequencing_summary.txt
+│   ├── sequencing_telemetry.js
+│   └── unclassified
 └── log
-log/
-├── assemble-13f6870a-e7ab-4475-8acc-6762e57e5d55.log # one of each of these logs for each barcode
-├── polish-medaka-3d22f12c-8a50-4dd7-9cc6-7c1bc5098b48.log
-├── polish-racon-6c22aa55-5a95-4d17-9a01-abeade24b431.log
-└── prepSample-157680be-7f14-4a32-8a74-4bfe5de0b624.log
+    └── guppy-gpu.o7984157
 ```
 #### Notes on assembly and polishing workflow
   * It will check for the following files, to determine if it should skip any of the steps. Helps if one part doesn't run correctly and you don't want to repeat a certain step, e.g. re-assembling.
@@ -155,11 +162,16 @@ log/
 If you are interested in contributing to nanoporeWorkflow, please take a look at the [contribution guidelines](CONTRIBUTING.md). We welcome issues or pull requests!
 
 ## Future plans
-  * add flags/options for other sequencing kits, barcoding kits, flowcells (direct RNAseq?)
-    * R9.4.1 + rapid or ligation sequencing kit without barcoding (RAD-004)
-    * R10 + ligation without barcoding 
-    * R10.3 + ligation with & without barcoding
-  * Add option for Medaka polishing with r941_min_fast model, if reads were basecalled with the fast Guppy model
+* Detect GPU availability since aspen GPU nodes have 2 V100 cards. `guppy_basecaller -x auto` is the same as saying `-x cuda:0` 👎 
+* Add support for passing in a config file to `workflow-after-gpu-basecalling.sh` that contains:
+  * Sample ID
+  * barcode number (RBK or NBD)
+  * estimated genome size (to be used as input parameter in various places)
+* Allow users to specify a read length for removing reads w/ `filtlong` (1000 bp could be to stringent if bulk of reads are around that length)
+* Test and add support for `rasusa` for randomly subsampling and filtering reads (`filtlong` is biased towards reads with highest q-scores)
+* add flags/options for other sequencing kits, barcoding kits, flowcells (direct RNAseq?)
+  * **R10.3** + ligation with native barcodes 1-24 (R10 flowcell discontinued)
+* Add option for Medaka polishing with r941_min_fast model, if reads were basecalled with the fast Guppy model
 
 ## Resources
   * https://github.com/fenderglass/Flye