Skip to content

Structural Variant Machine (SV-M) to accurately predict InDels from NGS paired-end short reads

License

Notifications You must be signed in to change notification settings

dominikgrimm/SV-M

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SV-M

Structural Variant Machine (SV-M) to accurately predict InDels from NGS paired-end short reads as described in:

D. Grimm J. Hagmann, D. Koenig, D. Weigel and K. Borgwardt (2013) Accurate indel prediction using paired-end short reads BMC Genomics 14:132 link

Structural Variant Machine (SV-M)

Contents:

  1. Installation
  2. Usage
  3. Input format
  4. Output files and format
  5. Training Data
  6. Author and license informations

1. Installation

To install the tool you have to compile the source code. Type into you Linux/Mac terminal:

make all

The source code get compiled, generating two directory (build, bin). The bin directory contains the complied tool sv-m.

To re-compile:

make clean
make all

2. Usage

a) Prediction

To predict if an indel is a true or false candidate use the -predict command:

./sv-m -predict <model_file> <normalization_parameter_file> <data_file> <output_filename>

where:

  • <model_file>: trained SVM model file
  • <normalization_parameter_file>: the corresponding normalization parameter file for the trained SVM model
  • <data_file>: input data file with all features
  • <output_filename>: filename for the output file

b) Training

To train a new SVM model on a set of features use the -train command:

./sv-m -train <data_filename> <output_directory>

where:

  • <data_filename>: input data file
  • <output_directory>: name of an existing emtpy output directory

Optional arguments:

  • -n k-fold (default = 10)
  • -experiments number of experiments/repeats (default=1)

(In general several experiments are performed)

3. Input format

a) Prediction

The <model_file> and <normalization_parameter_file> can be found in the Model folder in the root directory. For a new or different set of features these files have to be generated by performing a new training.

  • <data_file> format (tab seperated):
<chromosome> <start position> <end position> <feature 1> <feature 2> ... <feature n>

b) Training

  • <data_file> format (tab seperated):
<class label: 1 for positive, -1 for negative> <chromosome> <start position> <end position> <feature 1> <feature 2> ... <feature n>

4. Output files and format

a) Prediction

  • <output_file> format (tab seperated):
<class label, 1 positive, -1 negative class> <probability for positive class (negative class: 1-probability of positive class)> <chromsome> <start position> <end position> <feature 1> <feature 2> ... <feature n>

b) Training

The output directory contains the following output files:

model.svm: The trained model file

model_normalization.param: The corresponding normalization parameters for that model

results.txt: A summary of the performance of the model and the corresponding weights

experiments.tab: A tab seperated file containg the C-Value, AUC and BEP value for each experiment

<C-Value> <AUC> <BEP>

5. Training Data

The folder trainingdata contains the Sanger validated training data. For more detailed informations and the file format see the README file within the trainingdata folder.

6. Author and license informations

Version: 0.1 Author: Dominik Gerhard Grimm Mail: [email protected] Date: 07th of Dezember 2011

Group: Machine Learning and Computational Biology Group (http://webdav.tuebingen.mpg.de/u/karsten/group/) Institutes: Max Planck Institute for Developmental Biology and Max Planck Institute for Intelligent Systems (Tübingen, Germany)

This tool make use of libSVM 3.0 (www.csie.ntu.edu.tw/~cjlin/libsvm/)

About

Structural Variant Machine (SV-M) to accurately predict InDels from NGS paired-end short reads

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published