-
Notifications
You must be signed in to change notification settings - Fork 103
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
1d15972
commit a8db07f
Showing
3 changed files
with
68 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,66 @@ | ||
## Appendix (III): Bismark Deduplication | ||
|
||
This script is supposed to remove alignments to the same position in the genome from the Bismark mapping output (both single and paired-end SAM/BAM files), which can arise by e.g. excessive PCR amplification. If sequences align to the same genomic position but on different strands they will be scored individually. | ||
|
||
!!! Important: | ||
|
||
Please note that for paired-end BAM files the deduplication script expects Read1 and Read2 to follow each other in consecutive lines! If the file has been sorted by position make sure that you resort it by read name first (e.g. using samtools sort -n) | ||
|
||
A brief description of the Bismark deduplication and a full list of options can also be viewed by typing `deduplicate_bismark --help`. | ||
|
||
#### USAGE: `deduplicate_bismark [options] <filename(s)>` | ||
|
||
#### ARGUMENTS: | ||
|
||
- `<filenames>` | ||
|
||
A space-separated list of Bismark result files in BAM/SAM format. | ||
|
||
|
||
#### OPTIONS: | ||
|
||
- `-s/--single-end` | ||
|
||
Deduplicate single-end BAM/SAM Bismark files. Default: [AUTO-DETECT] | ||
|
||
- `-p/--paired-end` | ||
|
||
Deduplicate paired-end BAM/SAM Bismark files. Default: [AUTO-DETECT] | ||
|
||
- `-o/--outfile [filename]` | ||
|
||
The basename of a desired output file. This basename is modified to end into `.deduplicated.bam`, or `.multiple.deduplicated.bam` in `--multiple` mode, for consistency reasons. | ||
|
||
- `--output_dir [path]` | ||
|
||
Output directory, either relative or absolute. Output is written to the current directory if not specified explicitly. | ||
|
||
- `--barcode` | ||
|
||
In addition to chromosome, start position and orientation this will also take a potential barcode into consideration while deduplicating. The barcode needs to be the last element of the read ID and separated by a ':', e.g.: MISEQ:14:000000000-A55D0:1:1101:18024:2858_1:N:0:CTCCT | ||
|
||
- `--bam` | ||
|
||
The output will be written out in BAM format. This script will attempt to use the path to Samtools that was specified with `--samtools_path`, or, if it hasn't been specified,attempt to find Samtools in the `PATH`. If no installation of Samtools can be found, a GZIP compressed output is written out instead (yielding a `.sam.gz` output file). Default: ON. | ||
|
||
- `--sam` | ||
|
||
The output will be written out in SAM format. Default: OFF. | ||
|
||
- `--multiple` | ||
|
||
All specified input files are treated as one sample and concatenated together for deduplication. This uses Unix `cat` for SAM files and `samtools cat` for BAM files. Additional notes for BAM files: Although this works on either BAM or CRAM, all input files must be the same format as each other. The sequence dictionary of each input file must be identical, although this command does not check this. By default the header is taken from the first file to be concatenated. | ||
|
||
- `--samtools_path [path]` | ||
|
||
The path to your Samtools installation, e.g. `/home/user/samtools/`. Does not need to be specified explicitly if Samtools is in the `PATH` already | ||
|
||
- `--version` | ||
|
||
Print version information and exit | ||
|
||
|
||
#### OUTPUT | ||
|
||
The output is a BAM format by default, as well as a deduplication report (ending in '_deduplication_report.txt') | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters