MALVA: genotyping by Mapping-free ALternate-allele detection of known VAriants

Alignment-free genotyping of a set of known variants (in VCF format) directly from a sample of reads.

Install

MALVA is available on bioconda.

$ conda create -n malvatest -c bioconda malva

will create an environment named malvatest that includes MALVA and its dependencies.

Install from source code

Dependencies

To manually compile MALVA you'll need a C++17-compliant compiler, CMake, and the following libraries installed in your system:

sdsl-lite v2.1.1 under GPLv3 License
KMC >= v2.3 under GPLv3 License
htslib >= v1.10.2 under MIT/Expat License
zstd >= 1.5.1 under BSD License and GPLv2
zstdstream (included in this repository) under MIT License
xxHash (included in this repository) under BSD 2-Clause License
zlib under zlib license

Use your favorite system-wide package manager to install them before compiling MALVA.

Alternatively, you can also use conda (and bioconda) to install dependencies. For example (please adapt to you system setup):

conda create -n malvadeps -c conda-forge -c bioconda htslib kmc sdsl-lite cmake zstd-static cxx-compiler

Notice that these dependencies are needed only if you want to compile MALVA from sources, since otherwise it is already available on Bioconda in binary form (see above).

Download and installation

To download and compile the code run the following commands.

git clone https://github.com/AlgoLab/malva.git
cd malva
mkdir build
cd build
cmake -DCMAKE_BUILD_TYPE=Release ..
make

If the compilation is successful, the malva-geno binary will be copied to the ${PROJECT_ROOT}/bin directory.

Usage

Usage: malva-geno <subcommand> [-k KMER-SIZE] [-r REF-KMER-SIZE] [-c MAX-COV] <reference> <variants> <kmc_output_prefix>

Arguments:
	<subcommand>                      either index to create the reference index or call to call call the genotypes.

    -h, --help                        display this help and exit
    -k, --kmer-size                   size of the kmers to index (default:35)
    -r, --ref-kmer-size               size of the reference kmers to index (default:43)
    -e, --error-rate                  expected sample error rate (default:0.001)
    -s, --samples                     file containing the list of (VCF) samples to consider (default:-, i.e. all samples)
    -f, --freq-key                    a priori frequency key in the INFO column of the input VCF (default:AF)
    -c, --max-coverage                maximum coverage for variant alleles (default:200)
    -b, --bf-size                     bloom filter size in GB (default:4)
    -p, --strip-chr                   strip "chr" from sequence names (default:false)
    -u, --uniform                     use uniform a priori probabilities (default:false)
    -v, --verbose                     output COVS and GTS in INFO column (default: false)
    -1, --haploid                     run MALVA in haploid mode (default: false)

Positional arguments:
    <reference>                       reference file in FASTA format (may be gzipped)
    <variants>                        variants file in VCF format (may be gzipped)
    <kmc_output_prefix>               prefix of KMC output

The file needed by malva whose prefix is <kmc_output_prefix> can be computed with KMC as follows:

kmc -k<REF-KMER-SIZE> <sample> <kmc_output_prefix> <kmc_tmp_dir>

Anyway, we provide a bash script that you can use to run the full pipeline KMC+malva-geno:

Usage: MALVA [-k KMER-SIZE] [-r REF-KMER-SIZE] [-c MAX-COV] <reference> <variants> <sample>

Arguments:
     -h              print this help and exit
     -k              size of the kmers to index (default:35)
     -r              size of the reference kmers to index (default:43)
     -e              expected sample error rate (default:0.001)
     -s              file containing the list of (VCF) samples to consider (default:-, i.e. all samples)
     -f              a priori frequency key in the INFO column of the input VCF (default:AF)
     -c              maximum coverage for variant alleles (default:200)
     -b              bloom filter size in GB (default:4)
     -m              max amount of RAM in GB - KMC parameter (default:4)
     -p              strip "chr" from sequence names (dafault:false)
     -u              use uniform a priori probabilities (default:false)
     -v              output COVS and GTS in INFO column (default: false)
     -1              run MALVA in haploid mode (default: false)

Positional arguments:
    <reference>     reference file in FASTA format (can be gzipped)
    <variants>      variants file in VCF format (can be gzipped)
    <sample>        sample file in FASTA/FASTQ format (can be gzipped)

Example

After you compiled malva, you can test it on the example data provided:

cd example
tar xvfz data.tar.gz
../MALVA -k 35 -r 43 -b 1 -f EUR_AF chr20.fa chr20.vcf chr20.sample.fa > chr20.genotyped.vcf

The last command is equivalent to run:

mkdir -p kmc_tmp
kmc -m4 -k43 -fm chr20.sample.fa kmc.out kmc_tmp
../bin/malva-geno index -k 35 -r 43 -b 1 -f EUR_AF chr20.fa chr20.vcf kmc.out
../bin/malva-geno call -k 35 -r 43 -b 1 -f EUR_AF chr20.fa chr20.vcf kmc.out > chr20.genotyped.vcf

This should take less than 1 minute to complete. You can also verify the correcteness of the output VCF chr20.genotyped.vcf by comparing it with chr20.malva.vcf. For a quick comparison look MALVA-TEST: Workflow for testing MALVA Output.

Haploid mode - Example

To run MALVA in haploid mode just use the -1 argument.

cd example
tar xvfz haploid.tar.gz
../MALVA -1 -k 35 -r 43 -b 1 -f AF haploid.fa haploid.vcf haploid.fq > haploid.genotyped.vcf

This should take less than 1 minute to complete. You can also verify the correcteness of the output VCF haploid.genotyped.vcf by comparing it with haploid.malva.vcf. For a quick comparison look MALVA-TEST: Workflow for testing MALVA Output.

Note

The tool has been tested only on 64bit Linux system.

Authors

Marco Previtali
Luca Denti
Giulia Bernardini
Paola Bonizzoni
Alexander Schönhuth
Marco Burgio (code refactoring and malva-test utility)

For inquiries on this software please contact either MP or LD.

License

MALVA is distributed under the GPL-3.0-or-later license.

Name		Name	Last commit message	Last commit date
Latest commit History 156 Commits
example		example
malva_test		malva_test
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
MALVA		MALVA
README.md		README.md
argument_parser.hpp		argument_parser.hpp
bloom_filter.hpp		bloom_filter.hpp
kmap.hpp		kmap.hpp
kseq.h		kseq.h
main.cpp		main.cpp
var_block.hpp		var_block.hpp
variant.hpp		variant.hpp
xxhash.c		xxhash.c
xxhash.h		xxhash.h
zstdstream.cpp		zstdstream.cpp
zstdstream.h		zstdstream.h

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MALVA: genotyping by Mapping-free ALternate-allele detection of known VAriants

Install

Install from source code

Dependencies

Download and installation

Usage

Example

Haploid mode - Example

Note

Authors

License

About

Releases 7

Packages

Contributors 4

Languages

License

AlgoLab/malva

Folders and files

Latest commit

History

Repository files navigation

MALVA: genotyping by Mapping-free ALternate-allele detection of known VAriants

Install

Install from source code

Dependencies

Download and installation

Usage

Example

Haploid mode - Example

Note

Authors

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 7

Packages 0

Contributors 4

Languages

Packages