Skip to content

Commit

Permalink
updated README for 0.4.2 release [ci skip]
Browse files Browse the repository at this point in the history
  • Loading branch information
kapsakcj authored Apr 21, 2020
1 parent 9b99a8e commit feb1b4a
Showing 1 changed file with 53 additions and 41 deletions.
94 changes: 53 additions & 41 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -63,45 +63,47 @@ For `workflow.sh` the first positional parameter must be the project folder. Bo
3. `-b y || yes || n || no` - were barcodes used?
4. `-f r941 || r10` - flowcell type used?
5. `-k rapid || ligation` - sequencing kit used?
* copies fast5s from input directory to `/scratch/$USER/guppy.gpu.XXXXXX`
* runs `guppy_basecaller` in `hac` mode on GPUs
* copies fast5s from input directory to `/tmp/$USER/guppy.gpu.XXXXXX`
* runs `guppy_basecaller` in `hac` or high-accuracy mode on GPUs
* Demultiplexes using `guppy_basecaller` and additionally trims adapter and barcode sequences (using `--trim_barcodes ; --barcode_kits "EXP-NBD103" or "SQK-RBK004"` options)
* Compresses (gzip) the demultiplexed reads (`--compress_fastq` option)
* Copies demultiplexed, trimmed, compressed reads into subdirectories in `$OUTDIR/demux/barcodeXX`

#### USAGE:
```bash
# download the scripts, ignoring the test data stored via git-lfs
$ GIT_LFS_SKIP_SMUDGE=1 git clone https://github.com/lskatz/nanoporeWorkflow.git
$ wget https://github.com/lskatz/nanoporeWorkflow/archive/v0.4.2.tar.gz
$ tar -xzf v0.4.2.tar.gz

# TODO - download the repo from a release, once one is made

# optionally add the workflows to your $PATH (edit the PATH below to wherever you cloned the repo)
$ echo 'export PATH=$PATH:/path/to/nanoporeWorkflow/workflows' >> ~/.bashrc
# optionally add the workflows to your $PATH (edit the PATH below to wherever you downloaded the repo)
$ echo 'export PATH=$PATH:/path/to/nanoporeWorkflow-0.4.2/workflows' >> ~/.bashrc
# refresh your environment
$ source ~/.bashrc

Usage: ../nanoporeWorkflow/workflows/run_01_basecall-w-gpu.sh
Usage: /path/to/nanoporeWorkflow-0.4.2/workflows/run_01_basecall-w-gpu.sh
-i path/to/fast5files/
-o path/to/outputDirectory/
-b y || yes || n || no barcodes used?
-f r941 || r10 flowcell type used?
-k rapid || ligation sequencing kit used?

example: ../nanoporeWorkflow/workflows/run_01_basecall-w-gpu.sh -i fast5s/ -o output/ -b y -f r941 -k rapid
example: /path/to/nanoporeWorkflow-0.4.2/workflows/run_01_basecall-w-gpu.sh -i fast5s/ -o output/ -b y -f r941 -k rapid

# OUTPUT
# EXAMPLE OUTPUT
$OUTDIR
├── demux
│   ├── barcode06
│   │   └── barcode06.fastq.gz (there will be many .fastq.gz files per barcode)
│   ├── barcode10
│   │   └── barcode10.fastq.gz
│   ├── barcode12
│   │   └── barcode12.fastq.gz
│   ├── none.fastq.gz
|   └── sequencing_summary.txt

│   ├── barcode01
│   │   └── fastq_runid_fbc8eee46271cbe60ee8a49d0ca657f6e92e174e_0_0.fastq.gz (there will be many .fastq.gz files per barcode)
│   ├── barcode02
│   │   └── fastq_runid_fbc8eee46271cbe60ee8a49d0ca657f6e92e174e_0_0.fastq.gz
│   ├── barcode03
│   │   └── fastq_runid_fbc8eee46271cbe60ee8a49d0ca657f6e92e174e_0_0.fastq.gz
│   ├── guppy_basecaller_log-2020-04-17_09-45-00.log
│   ├── sequencing_summary.txt
│   ├── sequencing_telemetry.js
│   └── unclassified
│   └── fastq_runid_fbc8eee46271cbe60ee8a49d0ca657f6e92e174e_0_0.fastq.gz
└── log
└── guppy-gpu.o7987624
```

### Assembly with Flye and polishing with Racon and Medaka
Expand All @@ -117,7 +119,7 @@ $OUTDIR

#### Requirements
* Must have previously run the above script that basecalls reads on a GPU node.
* Not necessary to be on node98. Any server with the ability to `qsub` will work.
* Must be logged into a server with the ability to `qsub` (Aspen, Monoliths).
* `outdir` argument must be the same directory as the `OUTDIR` from the gpu-basecalling script
* Recommend `cd`'ing to one directory above and use `.` as the `outdir` argument (see USAGE below)

Expand All @@ -127,22 +129,27 @@ Usage:
# example - if you are one directory above the output directory from the gpu-basecalling script
~/nanoporeWorkflow/workflows/workflow-after-gpu-basecalling.sh outdir/

# OUTPUT - only showing one barcode for brevity, not all files included
$OUTDIR
demux/
├── barcode07
│   ├── all.fastq.gz
│   ├── flye
│   ├── medaka
│   ├── racon
│   ├── readlengths.txt.gz
│   └── reads.minlen1000.600Mb.fastq.gz
# EXAMPLE OUTPUT - only showing one barcode for brevity
$OUTDIR/
├── demux
│   ├── barcode01
│   │   ├── all.fastq.gz
│   │   ├── flye
│   │   ├── log # qsub logs
│   │   │   ├── assemble-d64ffbc5-4012-44c5-8191-1a57d4a7d15c.log
│   │   │   ├── polish-medaka-00e52c16-0bd3-460d-b955-3a532be958b1.log
│   │   │   ├── polish-racon-d7ebc124-d100-43e0-b347-1e60bbc0bf18.log
│   │   │   └── prepSample-7ecc6f51-4937-40d1-a6bd-d83e66078984.log
│   │   ├── medaka
│   │   ├── racon
│   │   ├── readlengths.txt.gz
│   │   └── reads.minlen1000.600Mb.fastq.gz
│   ├── guppy_basecaller_log-2020-04-17_09-45-00.log
│   ├── sequencing_summary.txt
│   ├── sequencing_telemetry.js
│   └── unclassified
└── log
log/
├── assemble-13f6870a-e7ab-4475-8acc-6762e57e5d55.log # one of each of these logs for each barcode
├── polish-medaka-3d22f12c-8a50-4dd7-9cc6-7c1bc5098b48.log
├── polish-racon-6c22aa55-5a95-4d17-9a01-abeade24b431.log
└── prepSample-157680be-7f14-4a32-8a74-4bfe5de0b624.log
   └── guppy-gpu.o7984157
```
#### Notes on assembly and polishing workflow
* It will check for the following files, to determine if it should skip any of the steps. Helps if one part doesn't run correctly and you don't want to repeat a certain step, e.g. re-assembling.
Expand All @@ -155,11 +162,16 @@ log/
If you are interested in contributing to nanoporeWorkflow, please take a look at the [contribution guidelines](CONTRIBUTING.md). We welcome issues or pull requests!

## Future plans
* add flags/options for other sequencing kits, barcoding kits, flowcells (direct RNAseq?)
* R9.4.1 + rapid or ligation sequencing kit without barcoding (RAD-004)
* R10 + ligation without barcoding
* R10.3 + ligation with & without barcoding
* Add option for Medaka polishing with r941_min_fast model, if reads were basecalled with the fast Guppy model
* Detect GPU availability since aspen GPU nodes have 2 V100 cards. `guppy_basecaller -x auto` is the same as saying `-x cuda:0` 👎
* Add support for passing in a config file to `workflow-after-gpu-basecalling.sh` that contains:
* Sample ID
* barcode number (RBK or NBD)
* estimated genome size (to be used as input parameter in various places)
* Allow users to specify a read length for removing reads w/ `filtlong` (1000 bp could be to stringent if bulk of reads are around that length)
* Test and add support for `rasusa` for randomly subsampling and filtering reads (`filtlong` is biased towards reads with highest q-scores)
* add flags/options for other sequencing kits, barcoding kits, flowcells (direct RNAseq?)
* **R10.3** + ligation with native barcodes 1-24 (R10 flowcell discontinued)
* Add option for Medaka polishing with r941_min_fast model, if reads were basecalled with the fast Guppy model

## Resources
* https://github.com/fenderglass/Flye
Expand Down

0 comments on commit feb1b4a

Please sign in to comment.