-
Notifications
You must be signed in to change notification settings - Fork 1
danmaclean/NiBLS
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
sRNA Loci detection algorithm. MacLean et al. 1. Overview The perl script here is an implementation for detecting sRNA loci from high throughput sequence data. The sequence data must be filtered and mapped to the reference genome prior to the use of the algorithms, it takse as input a gff file of start and stop positions of each sRNA on the reference. GFF files are flat tab-delimited files, each line of which corresponds to a sRNA. Each line has nine columns and looks like reference_sequence source method start stop score strand phase group 3. Procedures Briefly, NiBLS begins by calculating the overlaps between the mapped sRNAs (a minimum inclusion distance allows near misses to be used) and creating a graph data structure, with overlapping sRNA nodes linked by undirected edges. Once all overlaps are calculated the clustering coefficient (the ratio of all possible overlaps to those that actually occur for each subgraph is calculated. If a pre-selected cutoff is to be used any subgraph with clustering coefficient below the threshold is rejected. Each remaining graph represents a locus that extends from the 5' most sRNA in each graph to the 3' most sRNA. 4. Running the programs A version of the perl interpreter is required for the programs to run. Most versions of Unix/Linux and Mac OS X will have this installed already. Windows machines typically will not. Try Perl.com for info. Installation The program needs no real installation and can be run from anywhere. Simply move the whole folder containing the downloaded files to somewhere suitable Running The program is are called from the command line using the following convention: perl <program name> <options> e.g., perl nibls.pl my_input.gff 10 0.7 all output is written to standard out. You can redirect the output to a file like so perl nibls.pl -f my_input.gff > my_output.gff Options -f file name (required) -m minimum inclusion distance (default: 5) -c clustering coefficient (default: 0.6) 5. Output Loci are written as a gff file to standard output.
About
Implementation of an algorithm that can determine small RNA generative loci from high-throughput sequencing data.
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published