Skip to content

dsComputeGCCoverage

Gautier RICHARD edited this page Jan 15, 2020 · 3 revisions

Computing GC content along the genome

Description of the tool

dsComputeGCCoverage aims at calculating the GC content along a binned genome, and outputs it as a bedGraph. Multiple genomes can be passed to this tool as fasta files (not necessarily indexed) and the bins size is provided by the user.

This tool works in a memory efficient way, since sequences are pulled bin by bin from the fasta files, thus allowing to not load the entire fasta in the RAM. It is therefore unnecessary to split genomes fasta files per chromosome to use this tool.

Command-line help

Command Description
--input -i Fasta files from which you want the GC content to be calculated.
--windowSize -w Size of the window used to binify the genome and calculate the GC content. Default: 1000.
--output -o bedGraph file(s) output prefix name(s) ('.bedGraph' is automatically added at the end of the given prefix, one bedGraph per input file).

Example usage

dsComputeGCCoverage -i data/genome.fa data/genome_mitoc̀hondria.fa data/genome_chloroplast.fa -w 100 -o results/genome results/mitochondria results/chloroplast

This command will output three bedGraph files with the first lines resembling the following example:

Chr1   0     100   0.542
Chr1   100   200   0.657
Chr1   200   300   0.478
Clone this wiki locally