From bdb09fa6144fac2f4ab578e4fe3677d7748b8216 Mon Sep 17 00:00:00 2001 From: smarco Date: Wed, 6 Jul 2022 19:02:32 +0200 Subject: [PATCH] - Implemented Eizenga's M!=0 formula to allow any match score - Refactor debug report - Updated README --- README.md | 25 ++- .../{ => utests}/wfa.utest.cmp.alignments.sh | 0 scripts/utests/wfa.utest.default.sh | 42 ++++ scripts/utests/wfa.utest.extended.sh | 72 +++++++ scripts/utests/wfa.utest.heuristics.sh | 52 +++++ scripts/{ => utests}/wfa.utest.summary.sh | 0 scripts/wfa.alg.cmp.py | 39 ++-- ...test.cmp.score.sh => wfa.alg.cmp.score.sh} | 4 +- scripts/wfa.alg.rescore.py | 179 ++++++++++++++++++ .../benchmark/benchmark_check.c | 12 +- .../benchmark/benchmark_edit.c | 4 +- .../benchmark/benchmark_gap_affine.c | 6 +- .../benchmark/benchmark_gap_affine2p.c | 2 +- .../benchmark/benchmark_gap_linear.c | 2 +- tools/align_benchmark/edit/edit_dp.c | 4 +- tools/align_benchmark/edit/edit_dp.h | 4 +- tools/align_benchmark/gap_affine/swg.c | 6 +- tools/align_benchmark/gap_affine/swg.h | 6 +- .../gap_affine2p/affine2p_dp.c | 76 +------- .../gap_affine2p/affine2p_dp.h | 2 +- tools/align_benchmark/gap_linear/nw.c | 6 +- tools/align_benchmark/gap_linear/nw.h | 4 +- wavefront/wavefront_align.c | 40 ++-- wavefront/wavefront_aligner.c | 9 +- wavefront/wavefront_aligner.h | 1 + wavefront/wavefront_bialign.c | 5 +- wavefront/wavefront_debug.c | 77 ++++---- wavefront/wavefront_debug.h | 6 +- wavefront/wavefront_penalties.c | 100 +++++----- wavefront/wavefront_penalties.h | 50 +++-- 30 files changed, 569 insertions(+), 266 deletions(-) rename scripts/{ => utests}/wfa.utest.cmp.alignments.sh (100%) mode change 100644 => 100755 create mode 100755 scripts/utests/wfa.utest.default.sh create mode 100755 scripts/utests/wfa.utest.extended.sh create mode 100755 scripts/utests/wfa.utest.heuristics.sh rename scripts/{ => utests}/wfa.utest.summary.sh (100%) rename scripts/{wfa.utest.cmp.score.sh => wfa.alg.cmp.score.sh} (72%) create mode 100644 scripts/wfa.alg.rescore.py diff --git a/README.md b/README.md index a6d38627..8ddad00e 100644 --- a/README.md +++ b/README.md @@ -4,7 +4,7 @@ ### 1.1 What is WFA? -The wavefront alignment (WFA) algorithm is an **exact** gap-affine algorithm that takes advantage of homologous regions between the sequences to accelerate the alignment process. Unlike to traditional dynamic programming algorithms that run in quadratic time, the WFA runs in time `O(ns+s^2)`, proportional to the sequence length `n` and the alignment score `s`, using `O(s^2)` memory. Moreover, the WFA algorithm exhibits simple computational patterns that the modern compilers can automatically vectorize for different architectures without adapting the code. To intuitively illustrate why the WFA algorithm is so interesting, look at the following figure. The left panel shows the cells computed by a classical dynamic programming based algorithm (like Smith-Waterman or Needleman Wunsch). In contrast, the right panel shows the cells computed by the WFA algorithm to obtain the same result (i.e., the optimal alignment). +The wavefront alignment (WFA) algorithm is an **exact** gap-affine algorithm that takes advantage of homologous regions between the sequences to accelerate the alignment process. Unlike to traditional dynamic programming algorithms that run in quadratic time, the WFA runs in time `O(ns+s^2)`, proportional to the sequence length `n` and the alignment score `s`, using `O(s^2)` memory (or `O(s)` using the ultralow/BiWFA mode). Moreover, the WFA algorithm exhibits simple computational patterns that the modern compilers can automatically vectorize for different architectures without adapting the code. To intuitively illustrate why the WFA algorithm is so interesting, look at the following figure. The left panel shows the cells computed by a classical dynamic programming based algorithm (like Smith-Waterman or Needleman Wunsch). In contrast, the right panel shows the cells computed by the WFA algorithm to obtain the same result (i.e., the optimal alignment).

@@ -47,7 +47,7 @@ Section [WFA2-lib features](#wfa2.features) explores the most relevant options a - The WFA algorithm is an **exact algorithm**. If no heuristic is applied (e.g., band or adaptive pruning), the core algorithm guarantees to always find the optimal solution (i.e., best alignment score). Since its first release, some authors have referenced the WFA as approximated or heuristic, which is NOT the case. -- Given two sequences of length `n`, traditional dynamic-programming (DP) based methods (like Smith-Waterman or Needleman-Wunsch) compute the optimal alignment in `O(n^2)` time, using `O(n^2)` memory. In contrast, the WFA algorithm requires `O(ns+s^2)` time and `O(s^2)` memory (being `s` the optimal alignment score). Therefore, **the memory consumption of the WFA algorithm is not intrinsically higher than that of other methods**. Most DP-based methods can use heuristics (like banded, X-drop, or Z-drop) to reduce the execution time and the memory usage at the expense of losing accuracy. Likewise, **the WFA algorithm can also use heuristics to reduce the execution time and memory usage**. +- Given two sequences of length `n`, traditional dynamic-programming (DP) based methods (like Smith-Waterman or Needleman-Wunsch) compute the optimal alignment in `O(n^2)` time, using `O(n^2)` memory. In contrast, the WFA algorithm requires `O(ns+s^2)` time and `O(s^2)` memory (being `s` the optimal alignment score). Therefore, **the memory consumption of the WFA algorithm is not intrinsically higher than that of other methods**. Most DP-based methods can use heuristics (like banded, X-drop, or Z-drop) to reduce the execution time and the memory usage at the expense of losing accuracy. Likewise, **the WFA algorithm can also use heuristics to reduce the execution time and memory usage**. Moreover, the memory mode `ultralow` uses the BiWFA algorithm to execute in `O(ns+s^2)` time and linear `O(s)` memory. - **A note for the fierce competitors.** I can understand that science and publishing have become a fierce competition these days. Many researchers want their methods to be successful and popular, seeking funding, tenure, or even fame. If you are going to benchmark the WFA using the least favourable configuration, careless programming, and a disadvantageous setup, please, go ahead. But remember, researchers like you have put a lot of effort into developing the WFA. We all joined this "competition" because we sought to find better methods that could truly help other researchers. So, try to be nice, tone down the marketing, and produce fair evaluations and honest publications. @@ -147,7 +147,7 @@ cout << "Alignment score " << aligner.getAlignmentScore() << endl; * **Exact alignment** method that computes the optimal **alignment score** and/or **alignment CIGAR**. * Supports **multiple distance metrics** (i.e., indel, edit, gap-lineal, gap-affine, and dual-gap gap-affine). * Allows performing **end-to-end** (a.k.a. global) and **ends-free** (e.g., semi-global, extension, overlap) alignment. -* Implements **low-memory modes** to reduce and control memory consumption. +* Implements **low-memory modes** to reduce and control memory consumption (down to `O(s)` using the `ultralow` mode). * Supports various **heuristic strategies** to use on top of the core WFA algorithm. * WFA2-lib **operates with plain ASCII strings**. Although we mainly focus on aligning DNA/RNA sequences, the WFA algorithm and the WFA2-lib implementation work with any pair of strings. Moreover, these sequences do not have to be pre-processed (e.g., packed or profiled), nor any table must be precomputed (like the query profile, used within some Smith-Waterman implementations). * Due to its simplicity, the WFA algorithm can be automatically vectorized for any SIMD-compliant CPU supported by the compiler. For this reason, **the WFA2-lib implementation is independent of any specific ISA or processor model**. Unlike other hardware-dependent libraries, we aim to offer a multiplatform pairwise-alignment library that can be executed on different processors and models (e.g., SSE, AVX2, AVX512, POWER-ISA, ARM, NEON, SVE, SVE2, RISCV-RVV, ...). @@ -385,7 +385,7 @@ The WFA2 library allows computing alignments with different spans or shapes. Alt ### 3.4 Memory modes -The WFA2 library implements various memory modes: `wavefront_memory_high`, `wavefront_memory_med`, `wavefront_memory_low`. These modes allow regulating the overall memory consumption at the expense of execution time. The standard WFA algorithm, which stores explicitly all wavefronts in memory, correspond to the mode `wavefront_memory_high`. The other methods progressively reduce memory usage at the expense of slightly larger alignment times. These memory modes can be used transparently with other alignment options and generate identical results. Note that this option does not affect the score-only alignment mode (it already uses a minimal memory footprint of `O(s)`). +The WFA2 library implements various memory modes: `wavefront_memory_high`, `wavefront_memory_med`, `wavefront_memory_low`, and , `wavefront_memory_ultralow`. These modes allow regulating the overall memory consumption at the expense of execution time. The standard WFA algorithm, which stores explicitly all wavefronts in memory, correspond to the mode `wavefront_memory_high`. The other methods progressively reduce memory usage at the expense of slightly larger alignment times. These memory modes can be used transparently with other alignment options and generate identical results. Note that this option does not affect the score-only alignment mode (it already uses a minimal memory footprint of `O(s)`). ```C wavefront_aligner_attr_t attributes = wavefront_aligner_attr_default; @@ -394,7 +394,7 @@ The WFA2 library implements various memory modes: `wavefront_memory_high`, `wave ### 3.5 Heuristic modes -The WFA algorithm can be used combined with many heuristics to reduce the alignment time and memory used. As it happens to other alignment methods, heuristics can result in suboptimal solutions and loss of accuracy. Moreover, some heuristics may drop the alignment if the sequences exceed certain divergence thresholds (i.e., x-drop/z-drop). Due to the popularity and efficiency of these methods, the WFA2 library implements many of these heuristics. Note, **it is not about how little DP-matrix you compute, but about how good the resulting alignments are.** +The WFA algorithm can be used combined with many heuristics to reduce the alignment time and memory used. As it happens to other alignment methods, heuristics can result in suboptimal solutions and loss of accuracy. Moreover, some heuristics may drop the alignment if the sequences exceed certain divergence thresholds (i.e., x-drop/z-drop). Due to the popularity and efficiency of these methods, the WFA2 library implements many of these heuristics. Note, **it is not about how little DP-matrix you compute, but about how good/accurate the resulting alignments are.** - **None (for comparison)**. If no heuristic is used, the WFA behaves exploring cells of the DP-matrix in increasing score order (increasing scores correspond to colours from blue to red). @@ -500,11 +500,20 @@ The WFA algorithm can be used combined with many heuristics to reduce the alignm attributes.heuristic.steps_between_cutoffs = 100; ``` - +### 3.6 Some technical notes + +- Thanks to Eizenga's formulation, WFA2-lib can operate with any match score. Although, in practice, M=0 is still the most efficient choice as it takes advantage of matches between the sequences to accelerate the alignment computation. + +- All WFA2-lib algorithms/variants are stable. That is, for alignments having the same score, the library always resolves ties (between M, X, I,and D) using the same criteria: M (highest prio) > X > D > I (lowest prio). Nevertheless, the memory mode `ultralow` (BiWFA) is optimal (always reports the best alignment) but not stable. ## 4. REPORTING BUGS AND FEATURE REQUEST -Feedback and bug reporting is highly appreciated. Please report any issue or suggestion on github or email to the main developer (santiagomsola@gmail.com). +Feedback and bug reporting is highly appreciated. Please report any issue or suggestion on github or email to the main developer (santiagomsola@gmail.com). Don't hesitate to contact us +if: + - You experience any bug or crash. + - You want to request a feature or have any suggestion. + - Your application using the library is running slower than it should or you expected. + - Need help integrating the library into your tool. ## 5. LICENSE @@ -528,3 +537,5 @@ Miquel Moretó has contributed with fruitful technical discussions and tireless **Santiago Marco-Sola, Juan Carlos Moure, Miquel Moreto, Antonio Espinosa**. ["Fast gap-affine pairwise alignment using the wavefront algorithm."](https://doi.org/10.1093/bioinformatics/btaa777) Bioinformatics, 2020. +**Santiago Marco-Sola, Jordan M Eizenga, Andrea Guarracino, Benedict Paten, Erik Garrison, Miquel Moreto**. Optimal gap-affine alignment in O(s) space. _bioRxiv_ (2022). DOI [2022.04.14.488380](https://doi.org/10.1101/2022.04.14.488380) + diff --git a/scripts/wfa.utest.cmp.alignments.sh b/scripts/utests/wfa.utest.cmp.alignments.sh old mode 100644 new mode 100755 similarity index 100% rename from scripts/wfa.utest.cmp.alignments.sh rename to scripts/utests/wfa.utest.cmp.alignments.sh diff --git a/scripts/utests/wfa.utest.default.sh b/scripts/utests/wfa.utest.default.sh new file mode 100755 index 00000000..b00498a1 --- /dev/null +++ b/scripts/utests/wfa.utest.default.sh @@ -0,0 +1,42 @@ +#!/bin/bash -x +# PROJECT: Wavefront Alignments Algorithms (Unitary Tests) +# LICENCE: MIT License +# AUTHOR(S): Santiago Marco-Sola +# DESCRIPTION: WFA unitary tests (for performance & correcness) +# USAGE: ./wfa.utest.default.sh + +# Clear +rm *.log *.alg + +# Config +ALGORITHM="gap-affine-wfa" +REDUCTION="--wfa-heuristic=wfa-adaptive --wfa-heuristic-parameters 10,50,1" +LOWMEMORY="--wfa-memory-mode=med" + +# Utest for length=100 +\time -v ./bin/align_benchmark -a $ALGORITHM -i ../data/sim.l100.n100K.e2.seq -o sim.l100.e2.W.alg &> sim.l100.e2.W.log +\time -v ./bin/align_benchmark -a $ALGORITHM -i ../data/sim.l100.n100K.e2.seq -o sim.l100.e2.Wl.alg $LOWMEMORY &> sim.l100.e2.Wl.log +\time -v ./bin/align_benchmark -a $ALGORITHM -i ../data/sim.l100.n100K.e2.seq -o sim.l100.e2.Wa.alg $REDUCTION &> sim.l100.e2.Wa.log + +# Utest for length=1K +\time -v ./bin/align_benchmark -a $ALGORITHM -i ../data/sim.l1K.n10K.e20.seq -o sim.l1K.e20.W.alg &> sim.l1K.e20.W.log +\time -v ./bin/align_benchmark -a $ALGORITHM -i ../data/sim.l1K.n10K.e20.seq -o sim.l1K.e20.Wl.alg $LOWMEMORY &> sim.l1K.e20.Wl.log +\time -v ./bin/align_benchmark -a $ALGORITHM -i ../data/sim.l1K.n10K.e20.seq -o sim.l1K.e20.Wa.alg $REDUCTION &> sim.l1K.e20.Wa.log + +# Utest for length=10K +\time -v ./bin/align_benchmark -a $ALGORITHM -i ../data/sim.l10K.n1K.e20.seq -o sim.l10K.e20.W.alg &> sim.l10K.e20.W.log +\time -v ./bin/align_benchmark -a $ALGORITHM -i ../data/sim.l10K.n1K.e20.seq -o sim.l10K.e20.Wa.alg $REDUCTION &> sim.l10K.e20.Wa.log + +# Utest for length=100K +\time -v ./bin/align_benchmark -a $ALGORITHM -i ../data/sim.l100K.n1.e10.seq -o sim.l100K.e10.Wl.alg $LOWMEMORY &> sim.l100K.e10.Wl.log + +# +# Random tests +# +# INPUT="../data/sim.l1K.n10K.e20.seq -P 500" +# ./bin/align_benchmark -a indel-wfa -i $INPUT -c score --check-distance indel -v +# ./bin/align_benchmark -a edit-wfa -i $INPUT -c score --check-distance edit -v +# ./bin/align_benchmark -a gap-linear-wfa -i $INPUT -c score --check-distance linear -v +# ./bin/align_benchmark -a gap-linear-wfa -i $INPUT -c score --check-distance linear -v +# ./bin/align_benchmark -a gap-affine-wfa -i $INPUT -c score --check-distance affine -v +# ./bin/align_benchmark -a gap-affine-2p-wfa -i $INPUT -c score --check-distance affine2p -v \ No newline at end of file diff --git a/scripts/utests/wfa.utest.extended.sh b/scripts/utests/wfa.utest.extended.sh new file mode 100755 index 00000000..0d15dd13 --- /dev/null +++ b/scripts/utests/wfa.utest.extended.sh @@ -0,0 +1,72 @@ +#!/bin/bash -x +# PROJECT: Wavefront Alignments Algorithms (Unitary Tests) +# LICENCE: MIT License +# AUTHOR(S): Santiago Marco-Sola +# DESCRIPTION: WFA unitary tests (extended for different use-cases) +# USAGE: ./wfa.utest.extended.sh + +# Clear +rm -rf *.log *.alg utests +mkdir utests + +# Config +BIN="/usr/bin/time -v ./bin/align_benchmark" +BIWFA="--wfa-memory-mode=ultralow" +XTRAS="--check correct" + +# Base data +DATA100="../data/sim.l100.n100K.e2.seq" +DATA1K="../data/sim.l1K.n10K.e20.seq" +DATA10K="../data/sim.l10K.n1K.e20.seq" + +ENDSFREE1K="../data/sim.endsfree.l1K.n10K.seq" +ENDSFREE10K="../data/sim.endsfree.l10K.n1K.seq" + +LONG100K="../data/sim.l100K.n10.e20.seq" + +# Base (edit) +$BIN -a edit-wfa -i $DATA100 -o utests/test.l100.aedit.alg $XTRAS &> utests/test.l100.aedit.log +$BIN -a edit-wfa -i $DATA1K -o utests/test.l1K.aedit.alg $XTRAS &> utests/test.l1K.aedit.log +$BIN -a edit-wfa -i $DATA10K -o utests/test.l10K.aedit.alg $BIWFA $XTRAS &> utests/test.l10K.aedit.log +$BIN -a edit-wfa -i $DATA100 -o utests/test.l100.sedit.alg --wfa-score-only $XTRAS &> utests/test.l100.sedit.log +$BIN -a edit-wfa -i $DATA1K -o utests/test.l1K.sedit.alg --wfa-score-only $XTRAS &> utests/test.l1K.sedit.log +$BIN -a edit-wfa -i $DATA10K -o utests/test.l10K.sedit.alg --wfa-score-only $XTRAS &> utests/test.l10K.sedit.log + +# Base (gap-linear) +$BIN -a gap-linear-wfa -i $DATA100 -o utests/test.l100.alinear.alg $XTRAS &> utests/test.l100.alinear.log +$BIN -a gap-linear-wfa -i $DATA1K -o utests/test.l1K.alinear.alg $XTRAS &> utests/test.l1K.alinear.log +$BIN -a gap-linear-wfa -i $DATA10K -o utests/test.l10K.alinear.alg $BIWFA $XTRAS &> utests/test.l10K.alinear.log +$BIN -a gap-linear-wfa -i $DATA100 -o utests/test.l100.slinear.alg --wfa-score-only $XTRAS &> utests/test.l100.slinear.log +$BIN -a gap-linear-wfa -i $DATA1K -o utests/test.l1K.slinear.alg --wfa-score-only $XTRAS &> utests/test.l1K.slinear.log +$BIN -a gap-linear-wfa -i $DATA10K -o utests/test.l10K.slinear.alg --wfa-score-only $XTRAS &> utests/test.l10K.slinear.log + +# Base (gap-affine) +$BIN -a gap-affine-wfa -i $DATA100 -o utests/test.l100.aaffine.alg $XTRAS &> utests/test.l100.aaffine.log +$BIN -a gap-affine-wfa -i $DATA1K -o utests/test.l1K.aaffine.alg $XTRAS &> utests/test.l1K.aaffine.log +#$BIN -a gap-affine-wfa -i $DATA10K -o utests/test.l10K.aaffine.alg $BIWFA $XTRAS &> utests/test.l10K.aaffine.log +$BIN -a gap-affine-wfa -i $DATA100 -o utests/test.l100.saffine.alg --wfa-score-only $XTRAS &> utests/test.l100.saffine.log +$BIN -a gap-affine-wfa -i $DATA1K -o utests/test.l1K.saffine.alg --wfa-score-only $XTRAS &> utests/test.l1K.saffine.log +#$BIN -a gap-affine-wfa -i $DATA10K -o utests/test.l10K.saffine.alg --wfa-score-only $XTRAS &> utests/test.l10K.saffine.log + +# Base (gap-affine-2p) +$BIN -a gap-affine2p-wfa -i $DATA100 -o utests/test.l100.a2p.alg $XTRAS &> utests/test.l100.a2p.log +$BIN -a gap-affine2p-wfa -i $DATA1K -o utests/test.l1K.a2p.alg $XTRAS &> utests/test.l1K.a2p.log +#$BIN -a gap-affine2p-wfa -i $DATA10K -o utests/test.l10K.a2p.alg $BIWFA $XTRAS &> utests/test.l10K.a2p.log +$BIN -a gap-affine2p-wfa -i $DATA100 -o utests/test.l100.s2p.alg --wfa-score-only $XTRAS &> utests/test.l100.s2p.log +$BIN -a gap-affine2p-wfa -i $DATA1K -o utests/test.l1K.s2p.alg --wfa-score-only $XTRAS &> utests/test.l1K.s2p.log +#$BIN -a gap-affine2p-wfa -i $DATA10K -o utests/test.l10K.s2p.alg --wfa-score-only $XTRAS &> utests/test.l10K.s2p.log + +# Heuristics (gap-affine-2p alignment) +HEURISTIC="--wfa-heuristic=wfa-adaptive --wfa-heuristic-parameters=10,50,1" +$BIN -a gap-affine2p-wfa -i $DATA100 -o utests/test.adaptive.l100.alg $HEURISTIC $XTRAS &> utests/test.adaptive.l100.log +$BIN -a gap-affine2p-wfa -i $DATA1K -o utests/test.adaptive.l1K.alg $HEURISTIC $XTRAS &> utests/test.adaptive.l1K.log +$BIN -a gap-affine2p-wfa -i $DATA10K -o utests/test.adaptive.l10K.alg $HEURISTIC $XTRAS &> utests/test.adaptive.l10K.log + +# Ends-free (length-diff=20%,e=10%,indels=2,20%) +$BIN -a gap-affine2p-wfa -i $ENDSFREE1K -o utests/test.endsfree.l1K.alg $BIWFA --ends-free=200,200,200,200 $XTRAS &> utests/test.endsfree.l1K.log +$BIN -a gap-affine2p-wfa -i $ENDSFREE10K -o utests/test.endsfree.l10K.alg $BIWFA --ends-free=2000,2000,2000,2000 $XTRAS &> utests/test.endsfree.l10K.log + +# Very long sequences (100K) +#$BIN -a gap-affine2p-wfa -i $LONG100K -o utests/test.long.l100K.alg $BIWFA $XTRAS &> utests/test.long.l100K.log + + diff --git a/scripts/utests/wfa.utest.heuristics.sh b/scripts/utests/wfa.utest.heuristics.sh new file mode 100755 index 00000000..f7f31dd1 --- /dev/null +++ b/scripts/utests/wfa.utest.heuristics.sh @@ -0,0 +1,52 @@ +#!/bin/bash -x +# PROJECT: Wavefront Alignments Algorithms (Unitary Tests) +# LICENCE: MIT License +# AUTHOR(S): Santiago Marco-Sola +# DESCRIPTION: WFA unitary tests (for heuristics) +# USAGE: ./wfa.utest.heuristics.sh + +SINGLE_PAIR=$1 + +# Clean +rm *.alg *.wfa *.png + +MODE="full" +HEURISTIC="" +./bin/align_benchmark -a gap-affine-wfa -i $SINGLE_PAIR -o $SINGLE_PAIR.$MODE.alg $HEURISTIC --plot + +MODE="band.10.10" +HEURISTIC="--wfa-heuristic=banded-static --wfa-heuristic-parameters=-10,10" +./bin/align_benchmark -a gap-affine-wfa -i $SINGLE_PAIR -o $SINGLE_PAIR.$MODE.alg $HEURISTIC --plot +MODE="band.10.150" +HEURISTIC="--wfa-heuristic=banded-static --wfa-heuristic-parameters=-10,150" +./bin/align_benchmark -a gap-affine-wfa -i $SINGLE_PAIR -o $SINGLE_PAIR.$MODE.alg $HEURISTIC --plot + +MODE="aband.10.10" +HEURISTIC="--wfa-heuristic=banded-adaptive --wfa-heuristic-parameters=-10,10,1" +./bin/align_benchmark -a gap-affine-wfa -i $SINGLE_PAIR -o $SINGLE_PAIR.$MODE.alg $HEURISTIC --plot +MODE="aband.50.50" +HEURISTIC="--wfa-heuristic=banded-adaptive --wfa-heuristic-parameters=-50,50,1" +./bin/align_benchmark -a gap-affine-wfa -i $SINGLE_PAIR -o $SINGLE_PAIR.$MODE.alg $HEURISTIC --plot + +MODE="wf-adap.10.50.1" +HEURISTIC="--wfa-heuristic=wfa-adaptive --wfa-heuristic-parameters=10,50,1" +./bin/align_benchmark -a gap-affine-wfa -i $SINGLE_PAIR -o $SINGLE_PAIR.$MODE.alg $HEURISTIC --plot +MODE="wf-adap.10.10.1" +HEURISTIC="--wfa-heuristic=wfa-adaptive --wfa-heuristic-parameters=10,10,1" +./bin/align_benchmark -a gap-affine-wfa -i $SINGLE_PAIR -o $SINGLE_PAIR.$MODE.alg $HEURISTIC --plot +MODE="wf-adap.10.50.10" +HEURISTIC="--wfa-heuristic=wfa-adaptive --wfa-heuristic-parameters=10,50,10" +./bin/align_benchmark -a gap-affine-wfa -i $SINGLE_PAIR -o $SINGLE_PAIR.$MODE.alg $HEURISTIC --plot + +MODE="xdrop.20" +HEURISTIC="--wfa-heuristic=xdrop --wfa-heuristic-parameters=20,20" +./bin/align_benchmark -a gap-affine-wfa -i $SINGLE_PAIR -o $SINGLE_PAIR.$MODE.alg $HEURISTIC --plot +MODE="zdrop.20" +HEURISTIC="--wfa-heuristic=zdrop --wfa-heuristic-parameters=20,20" +./bin/align_benchmark -a gap-affine-wfa -i $SINGLE_PAIR -o $SINGLE_PAIR.$MODE.alg $HEURISTIC --plot + +# Produce PNG +python3 ./scripts/wfa2png.py --compact + +# Crop +# for i in *.png; do convert $i -crop 8118x6602+0+1000 ${i%.*}.cropped.png; done diff --git a/scripts/wfa.utest.summary.sh b/scripts/utests/wfa.utest.summary.sh similarity index 100% rename from scripts/wfa.utest.summary.sh rename to scripts/utests/wfa.utest.summary.sh diff --git a/scripts/wfa.alg.cmp.py b/scripts/wfa.alg.cmp.py index 2ccb8e1f..9fd20cc3 100644 --- a/scripts/wfa.alg.cmp.py +++ b/scripts/wfa.alg.cmp.py @@ -29,24 +29,24 @@ class Penalties: def __init__(self): self.distance = Distance.edit self.match = 0 - self.mismatch = 1 - self.indel = 1 - self.gap_open1 = -1 - self.gap_extend1 = -1 - self.gap_open2 = -1 - self.gap_extend2 = -1 + self.mismatch = 0 + self.indel = 0 + self.gap_open1 = 0 + self.gap_extend1 = 0 + self.gap_open2 = 0 + self.gap_extend2 = 0 def cigar_compute_score_edit(cigar_vector,penalties,ignore_misms): score = 0 for op in cigar_vector: - if op[1] in "DI": score -= int(op[0]) - if op[1] in "X" and not ignore_misms: score -= int(op[0]) + if op[1] in "DI": score += int(op[0]) + if op[1] in "X" and not ignore_misms: score += int(op[0]) return score def cigar_compute_score_linear(cigar_vector,penalties,ignore_misms): score = 0 for op in cigar_vector: - if op[1] == "M": score += int(op[0]) * penalties.match + if op[1] == "M": score -= int(op[0]) * penalties.match if op[1] == "X" and not ignore_misms: score -= int(op[0]) * penalties.mismatch if op[1] in "DI": score -= int(op[0]) * penalties.indel return score @@ -54,7 +54,7 @@ def cigar_compute_score_linear(cigar_vector,penalties,ignore_misms): def cigar_compute_score_affine(cigar_vector,penalties,ignore_misms): score = 0 for op in cigar_vector: - if op[1] == "M": score += int(op[0]) * penalties.match + if op[1] == "M": score -= int(op[0]) * penalties.match if op[1] == "X" and not ignore_misms: score -= int(op[0]) * penalties.mismatch if op[1] in "DI": score -= penalties.gap_open1 + int(op[0]) * penalties.gap_extend1 return score @@ -62,7 +62,7 @@ def cigar_compute_score_affine(cigar_vector,penalties,ignore_misms): def cigar_compute_score_affine_2p(cigar_vector,penalties,ignore_misms): score = 0 for op in cigar_vector: - if op[1] == "M": score += int(op[0]) * penalties.match + if op[1] == "M": score -= int(op[0]) * penalties.match if op[1] == "X" and not ignore_misms: score -= int(op[0]) * penalties.mismatch if op[1] in "DI": score1 = penalties.gap_open1 + int(op[0]) * penalties.gap_extend1 @@ -198,7 +198,7 @@ def compare_alignments(input_path1,input_path2,penalties,use_score,ignore_misms, parser.add_argument('-g','--gap-affine', help='Score alignments using gap-affine distance (--gap-affine=M,X,O,E)') parser.add_argument('-G','--gap-affine-2p', - help='Score alignments using gap-affine-2p distance (--gap-affine=M,X,O1,E1,O2,E2)') + help='Score alignments using gap-affine-2p distance (--gap-affine-2p=M,X,O1,E1,O2,E2)') parser.add_argument('--use-score', action='store_true',default=False, help='Compares the score provided in the file (default=use-cigar)') parser.add_argument('--ignore-misms', action='store_true',default=False, @@ -239,7 +239,20 @@ def compare_alignments(input_path1,input_path2,penalties,use_score,ignore_misms, penalties.gap_open2 = int(values[4]) penalties.gap_extend2 = int(values[5]) else: - print("[WFACompareAlignments] No distance provided. Using edit-distance (default)"); + print("[WFACompareAlignments] No distance provided. Using edit-distance (default)") + +# Check penalties +if penalties.match > 0: + print("[WFACompareAlignments] Match penalty must be negative or zero") + exit(-1) +if penalties.mismatch < 0 or \ + penalties.mismatch < 0 or \ + penalties.gap_open1 < 0 or \ + penalties.gap_extend1 < 0 or \ + penalties.gap_open2 < 0 or \ + penalties.gap_extend2 < 0: + print("[WFACompareAlignments] Penalties must be positive or zero") + exit(-1) # Compare alignments from both files stats = compare_alignments( diff --git a/scripts/wfa.utest.cmp.score.sh b/scripts/wfa.alg.cmp.score.sh similarity index 72% rename from scripts/wfa.utest.cmp.score.sh rename to scripts/wfa.alg.cmp.score.sh index ef85353f..05f7c191 100644 --- a/scripts/wfa.utest.cmp.score.sh +++ b/scripts/wfa.alg.cmp.score.sh @@ -1,9 +1,9 @@ #!/bin/bash -# PROJECT: Wavefront Alignments Algorithms (Unitary Tests) +# PROJECT: Wavefront Alignments Algorithms # LICENCE: MIT License # AUTHOR(S): Santiago Marco-Sola # DESCRIPTION: Compare alignment files (*.alg) -# USAGE: ./wfa.utest.cmp.score.sh file1.alg file2.alg +# USAGE: ./wfa.alg.cmp.score.sh file1.alg file2.alg # Parameters FILE1=$1 diff --git a/scripts/wfa.alg.rescore.py b/scripts/wfa.alg.rescore.py new file mode 100644 index 00000000..842b2f22 --- /dev/null +++ b/scripts/wfa.alg.rescore.py @@ -0,0 +1,179 @@ +#!/usr/bin/python3 +# PROJECT: Wavefront Alignments Algorithms +# LICENCE: MIT License +# AUTHOR(S): Santiago Marco-Sola +# DESCRIPTION: Rescore alignment (*.alg) files +# USAGE: python3 wfa.alg.rescore.py -h + +import sys +import re +import enum +import argparse + +import numpy as np +import matplotlib +import matplotlib.pyplot as plt +from matplotlib.patches import Rectangle +from scipy.constants.constants import alpha + +################################################################################ +# Distances & penalties +################################################################################ +class Distance(enum.Enum): + edit = 1 + gap_linear = 2 + gap_affine = 3 + gap_affine_2p = 4 + +class Penalties: + def __init__(self): + self.distance = Distance.edit + self.match = 0 + self.mismatch = 0 + self.indel = 0 + self.gap_open1 = 0 + self.gap_extend1 = 0 + self.gap_open2 = 0 + self.gap_extend2 = 0 + +def cigar_compute_score_edit(cigar_vector,penalties): + score = 0 + for op in cigar_vector: + if op[1] in "DI": score += int(op[0]) + if op[1] in "X": score += int(op[0]) + return score + +def cigar_compute_score_linear(cigar_vector,penalties): + score = 0 + for op in cigar_vector: + if op[1] == "M": score -= int(op[0]) * penalties.match + if op[1] == "X": score -= int(op[0]) * penalties.mismatch + if op[1] in "DI": score -= int(op[0]) * penalties.indel + return score + +def cigar_compute_score_affine(cigar_vector,penalties): + score = 0 + for op in cigar_vector: + if op[1] == "M": score -= int(op[0]) * penalties.match + if op[1] == "X": score -= int(op[0]) * penalties.mismatch + if op[1] in "DI": score -= penalties.gap_open1 + int(op[0]) * penalties.gap_extend1 + return score + +def cigar_compute_score_affine_2p(cigar_vector,penalties): + score = 0 + for op in cigar_vector: + if op[1] == "M": score -= int(op[0]) * penalties.match + if op[1] == "X": score -= int(op[0]) * penalties.mismatch + if op[1] in "DI": + score1 = penalties.gap_open1 + int(op[0]) * penalties.gap_extend1 + score2 = penalties.gap_open2 + int(op[0]) * penalties.gap_extend2 + score -= min(score1,score2) + return score + +def cigar_compute_score(cigar,penalties): + # Parse CIGAR + # cigar = "10M3D5M1I10M" + # cigar_vector = [('10', 'M'), ('3', 'D'), ('5', 'M'), ('1', 'I'), ('10', 'M')] + cigar_vector = re.findall(r'(\d+)([MXDI])',cigar) + # Evaluate CIGAR + if penalties.distance == Distance.edit: + return cigar_compute_score_edit(cigar_vector,penalties) + elif penalties.distance == Distance.gap_linear: + return cigar_compute_score_linear(cigar_vector,penalties) + elif penalties.distance == Distance.gap_affine: + return cigar_compute_score_affine(cigar_vector,penalties) + elif penalties.distance == Distance.gap_affine_2p: + return cigar_compute_score_affine_2p(cigar_vector,penalties) + +################################################################################ +# Compare both files +################################################################################ +def rescore_alignments(input_path,output_path,penalties,verbose): + # Open files + input_file = open(input_path,"rt") + output_file = open(output_path,"wt") + # Read lines + while True: + # Read lines + line = input_file.readline() + if not line: break # End of file + # Extract alignments + fields = line.split() + cigar = fields[1] if len(fields)<=2 else fields[5] + # Evaluate CIGAR's score + score = cigar_compute_score(cigar,penalties) + # Print rescore and CIGAR + output_file.write("%d\t%s\n" % (score,cigar)) + # Close files + input_file.close() + output_file.close() + +################################################################################ +# Main +################################################################################ +# Configure arguments +parser = argparse.ArgumentParser() +parser.add_argument('-i','--input',required=True,help='Input file (*.alg)') +parser.add_argument('-o','--output',required=True,help='Output file (*.alg)') +parser.add_argument('-e','--edit', + help='Score alignments using edit distance (--edit)') +parser.add_argument('-l','--gap-linear', + help='Score alignments using gap-linear distance (--gap-linear=M,X,I)') +parser.add_argument('-g','--gap-affine', + help='Score alignments using gap-affine distance (--gap-affine=M,X,O,E)') +parser.add_argument('-G','--gap-affine-2p', + help='Score alignments using gap-affine-2p distance (--gap-affine=M,X,O1,E1,O2,E2)') +parser.add_argument('-v','--verbose',action='store_true',default=False, + help='Display comparison differences (default=disable)') +parser.add_argument('-H',action='store_true',dest="human_readable",default=False) + +# Parse arguments +args = parser.parse_args() + +# Select distance +penalties = Penalties() +if args.edit is not None: + pass # Default +elif args.gap_linear is not None: + values = args.gap_linear.split(',') + penalties.distance = Distance.gap_linear + penalties.match = int(values[0]) + penalties.mismatch = int(values[1]) + penalties.indel = int(values[2]) +elif args.gap_affine is not None: + values = args.gap_affine.split(',') + penalties.distance = Distance.gap_affine + penalties.match = int(values[0]) + penalties.mismatch = int(values[1]) + penalties.gap_open1 = int(values[2]) + penalties.gap_extend1 = int(values[3]) +elif args.gap_affine_2p is not None: + values = args.gap_affine_2p.split(',') + penalties.distance = Distance.gap_affine_2p + penalties.match = int(values[0]) + penalties.mismatch = int(values[1]) + penalties.gap_open1 = int(values[2]) + penalties.gap_extend1 = int(values[3]) + penalties.gap_open2 = int(values[4]) + penalties.gap_extend2 = int(values[5]) +else: + print("[WFAReScoreAlignments] No distance provided. Using edit-distance (default)"); + +# Check penalties +if penalties.match > 0: + print("[WFACompareAlignments] Match penalty must be negative or zero") + exit(-1) +if penalties.mismatch < 0 or \ + penalties.mismatch < 0 or \ + penalties.gap_open1 < 0 or \ + penalties.gap_extend1 < 0 or \ + penalties.gap_open2 < 0 or \ + penalties.gap_extend2 < 0: + print("[WFACompareAlignments] Penalties must be positive or zero") + exit(-1) + +# Rescore alignments +rescore_alignments(args.input,args.output,penalties,args.verbose) + + + \ No newline at end of file diff --git a/tools/align_benchmark/benchmark/benchmark_check.c b/tools/align_benchmark/benchmark/benchmark_check.c index ea7109a6..da4c9bfd 100644 --- a/tools/align_benchmark/benchmark/benchmark_check.c +++ b/tools/align_benchmark/benchmark/benchmark_check.c @@ -161,11 +161,11 @@ void benchmark_check_alignment_edit( align_input->pattern_length+align_input->text_length, align_input->mm_allocator); if (align_input->check_bandwidth <= 0) { - edit_dp_compute(&score_matrix, + edit_dp_align(&score_matrix, align_input->pattern,align_input->pattern_length, align_input->text,align_input->text_length,&cigar); } else { - edit_dp_compute_banded(&score_matrix, + edit_dp_align_banded(&score_matrix, align_input->pattern,align_input->pattern_length, align_input->text,align_input->text_length, align_input->check_bandwidth,&cigar); @@ -192,7 +192,7 @@ void benchmark_check_alignment_gap_linear( cigar_allocate(&cigar, align_input->pattern_length+align_input->text_length, align_input->mm_allocator); - nw_compute(&score_matrix, + nw_align(&score_matrix, align_input->check_linear_penalties, align_input->pattern,align_input->pattern_length, align_input->text,align_input->text_length,&cigar); @@ -222,11 +222,11 @@ void benchmark_check_alignment_gap_affine( align_input->mm_allocator); // Compute correct if (align_input->check_bandwidth <= 0) { - swg_compute(&affine_matrix,align_input->check_affine_penalties, + swg_align(&affine_matrix,align_input->check_affine_penalties, align_input->pattern,align_input->pattern_length, align_input->text,align_input->text_length,&cigar); } else { - swg_compute_banded(&affine_matrix,align_input->check_affine_penalties, + swg_align_banded(&affine_matrix,align_input->check_affine_penalties, align_input->pattern,align_input->pattern_length, align_input->text,align_input->text_length, align_input->check_bandwidth,&cigar); @@ -256,7 +256,7 @@ void benchmark_check_alignment_gap_affine2p( align_input->pattern_length+align_input->text_length, align_input->mm_allocator); // Compute correct - affine2p_dp_compute( + affine2p_dp_align( &affine_matrix,align_input->check_affine2p_penalties, align_input->pattern,align_input->pattern_length, align_input->text,align_input->text_length,&cigar); diff --git a/tools/align_benchmark/benchmark/benchmark_edit.c b/tools/align_benchmark/benchmark/benchmark_edit.c index 0c868876..3652a0d6 100644 --- a/tools/align_benchmark/benchmark/benchmark_edit.c +++ b/tools/align_benchmark/benchmark/benchmark_edit.c @@ -83,7 +83,7 @@ void benchmark_edit_dp( align_input->mm_allocator); // Align timer_start(&align_input->timer); - edit_dp_compute(&score_matrix, + edit_dp_align(&score_matrix, align_input->pattern,pattern_length, align_input->text,text_length,&cigar); timer_stop(&align_input->timer); @@ -116,7 +116,7 @@ void benchmark_edit_dp_banded( align_input->mm_allocator); // Align timer_start(&align_input->timer); - edit_dp_compute_banded(&score_matrix, + edit_dp_align_banded(&score_matrix, align_input->pattern,pattern_length, align_input->text,text_length, bandwidth,&cigar); diff --git a/tools/align_benchmark/benchmark/benchmark_gap_affine.c b/tools/align_benchmark/benchmark/benchmark_gap_affine.c index 049644ce..9c184a49 100644 --- a/tools/align_benchmark/benchmark/benchmark_gap_affine.c +++ b/tools/align_benchmark/benchmark/benchmark_gap_affine.c @@ -52,7 +52,7 @@ void benchmark_gap_affine_swg( align_input->mm_allocator); // Align timer_start(&align_input->timer); - swg_compute(&affine_matrix,penalties, + swg_align(&affine_matrix,penalties, align_input->pattern,align_input->pattern_length, align_input->text,align_input->text_length,&cigar); timer_stop(&align_input->timer); @@ -82,7 +82,7 @@ void benchmark_gap_affine_swg_endsfree( align_input->mm_allocator); // Align timer_start(&align_input->timer); - swg_compute_endsfree(&affine_matrix,penalties, + swg_align_endsfree(&affine_matrix,penalties, align_input->pattern,align_input->pattern_length, align_input->text,align_input->text_length, align_input->pattern_begin_free,align_input->pattern_begin_free, @@ -115,7 +115,7 @@ void benchmark_gap_affine_swg_banded( align_input->mm_allocator); // Align timer_start(&align_input->timer); - swg_compute_banded(&affine_matrix,penalties, + swg_align_banded(&affine_matrix,penalties, align_input->pattern,align_input->pattern_length, align_input->text,align_input->text_length,bandwidth,&cigar); timer_stop(&align_input->timer); diff --git a/tools/align_benchmark/benchmark/benchmark_gap_affine2p.c b/tools/align_benchmark/benchmark/benchmark_gap_affine2p.c index 844f829d..bead8a5f 100644 --- a/tools/align_benchmark/benchmark/benchmark_gap_affine2p.c +++ b/tools/align_benchmark/benchmark/benchmark_gap_affine2p.c @@ -52,7 +52,7 @@ void benchmark_gap_affine2p_dp( align_input->mm_allocator); // Align timer_start(&align_input->timer); - affine2p_dp_compute(&matrix,penalties, + affine2p_dp_align(&matrix,penalties, align_input->pattern,align_input->pattern_length, align_input->text,align_input->text_length,&cigar); timer_stop(&align_input->timer); diff --git a/tools/align_benchmark/benchmark/benchmark_gap_linear.c b/tools/align_benchmark/benchmark/benchmark_gap_linear.c index 3feba33a..39abe03a 100644 --- a/tools/align_benchmark/benchmark/benchmark_gap_linear.c +++ b/tools/align_benchmark/benchmark/benchmark_gap_linear.c @@ -51,7 +51,7 @@ void benchmark_gap_linear_nw( align_input->mm_allocator); // Align timer_start(&align_input->timer); - nw_compute(&score_matrix,penalties, + nw_align(&score_matrix,penalties, align_input->pattern,align_input->pattern_length, align_input->text,align_input->text_length,&cigar); timer_stop(&align_input->timer); diff --git a/tools/align_benchmark/edit/edit_dp.c b/tools/align_benchmark/edit/edit_dp.c index 0d09c3e4..5ca62004 100644 --- a/tools/align_benchmark/edit/edit_dp.c +++ b/tools/align_benchmark/edit/edit_dp.c @@ -70,7 +70,7 @@ void edit_dp_traceback( while (v>0) {operations[op_sentinel--] = 'D'; --v;} cigar->begin_offset = op_sentinel + 1; } -void edit_dp_compute( +void edit_dp_align( score_matrix_t* const score_matrix, const char* const pattern, const int pattern_length, @@ -100,7 +100,7 @@ void edit_dp_compute( /* * Edit distance computation using dynamic-programming matrix (banded) */ -void edit_dp_compute_banded( +void edit_dp_align_banded( score_matrix_t* const score_matrix, const char* const pattern, const int pattern_length, diff --git a/tools/align_benchmark/edit/edit_dp.h b/tools/align_benchmark/edit/edit_dp.h index 4b30817d..806d80ae 100644 --- a/tools/align_benchmark/edit/edit_dp.h +++ b/tools/align_benchmark/edit/edit_dp.h @@ -38,7 +38,7 @@ /* * Edit distance computation using dynamic-programming matrix */ -void edit_dp_compute( +void edit_dp_align( score_matrix_t* const score_matrix, const char* const pattern, const int pattern_length, @@ -48,7 +48,7 @@ void edit_dp_compute( /* * Edit distance computation using dynamic-programming matrix (banded) */ -void edit_dp_compute_banded( +void edit_dp_align_banded( score_matrix_t* const score_matrix, const char* const pattern, const int pattern_length, diff --git a/tools/align_benchmark/gap_affine/swg.c b/tools/align_benchmark/gap_affine/swg.c index e31760e9..24c93277 100644 --- a/tools/align_benchmark/gap_affine/swg.c +++ b/tools/align_benchmark/gap_affine/swg.c @@ -102,7 +102,7 @@ void swg_traceback( /* * SWG alignment */ -void swg_compute( +void swg_align( affine_matrix_t* const affine_matrix, affine_penalties_t* const penalties, const char* const pattern, @@ -156,7 +156,7 @@ void swg_compute( /* * SWG alignment (ends-free) */ -void swg_compute_endsfree( +void swg_align_endsfree( affine_matrix_t* const affine_matrix, affine_penalties_t* const penalties, const char* const pattern, @@ -237,7 +237,7 @@ void swg_compute_endsfree( /* * SWG alignment (banded) */ -void swg_compute_banded( +void swg_align_banded( affine_matrix_t* const affine_matrix, affine_penalties_t* const penalties, const char* const pattern, diff --git a/tools/align_benchmark/gap_affine/swg.h b/tools/align_benchmark/gap_affine/swg.h index 9be1fcfe..a3a13c2b 100644 --- a/tools/align_benchmark/gap_affine/swg.h +++ b/tools/align_benchmark/gap_affine/swg.h @@ -39,7 +39,7 @@ /* * SWG alignment */ -void swg_compute( +void swg_align( affine_matrix_t* const affine_matrix, affine_penalties_t* const penalties, const char* const pattern, @@ -51,7 +51,7 @@ void swg_compute( /* * SWG alignment (ends-free) */ -void swg_compute_endsfree( +void swg_align_endsfree( affine_matrix_t* const affine_matrix, affine_penalties_t* const penalties, const char* const pattern, @@ -67,7 +67,7 @@ void swg_compute_endsfree( /* * SWG alignment (banded) */ -void swg_compute_banded( +void swg_align_banded( affine_matrix_t* const affine_matrix, affine_penalties_t* const penalties, const char* const pattern, diff --git a/tools/align_benchmark/gap_affine2p/affine2p_dp.c b/tools/align_benchmark/gap_affine2p/affine2p_dp.c index d82a691e..be4000c8 100644 --- a/tools/align_benchmark/gap_affine2p/affine2p_dp.c +++ b/tools/align_benchmark/gap_affine2p/affine2p_dp.c @@ -114,7 +114,7 @@ void affine2p_dp_traceback( /* * Gap-affine 2-piece alignment computation using dynamic-programming matrix */ -void affine2p_dp_compute( +void affine2p_dp_align( affine2p_matrix_t* const matrix, affine2p_penalties_t* const penalties, const char* const pattern, @@ -179,76 +179,4 @@ void affine2p_dp_compute( // // DEBUG // affine2p_matrix_print(stderr,matrix,pattern,text); } -// /* -// * Gap-affine 2-piece alignment computation using dynamic-programming matrix (banded) -// */ -// void affine2p_dp_compute_banded( -// affine2p_matrix_t* const matrix, -// affine2p_penalties_t* const penalties, -// const char* const pattern, -// const int pattern_length, -// const char* const text, -// const int text_length, -// const int bandwidth, -// cigar_t* const cigar) { -// // Parameters -// const int k_end = ABS(text_length-pattern_length)+1; -// int effective_bandwidth = bandwidth; -// if (effective_bandwidth < k_end) effective_bandwidth = k_end; -// if (effective_bandwidth > pattern_length) effective_bandwidth = pattern_length; -// if (effective_bandwidth > text_length) effective_bandwidth = text_length; -// affine_cell_t** const dp = affine_matrix->columns; -// int h, v; -// // Initialize -// dp[0][0].D = AFFINE2P_SCORE_MAX; -// dp[0][0].I = AFFINE2P_SCORE_MAX; -// dp[0][0].M = 0; -// for (v=1;v<=effective_bandwidth;++v) { -// dp[0][v].D = penalties->gap_opening + v*penalties->gap_extension; -// dp[0][v].I = AFFINE2P_SCORE_MAX; -// dp[0][v].M = dp[0][v].D; -// } -// // Compute DP -// for (h=1;h<=text_length;++h) { -// // Compute lo limit -// int lo; -// if (h <= effective_bandwidth) { -// lo = 1; -// dp[h][lo-1].D = AFFINE2P_SCORE_MAX; -// dp[h][lo-1].I = penalties->gap_opening + h*penalties->gap_extension; -// dp[h][lo-1].M = dp[h][lo-1].I; -// } else { -// lo = h - effective_bandwidth; -// dp[h][lo-1].D = AFFINE2P_SCORE_MAX; -// dp[h][lo-1].I = AFFINE2P_SCORE_MAX; -// dp[h][lo-1].M = AFFINE2P_SCORE_MAX; -// } -// // Compute hi limit -// int hi = h + effective_bandwidth - 1; -// if (hi > pattern_length) { -// hi = pattern_length; -// } else if (h > 1) { -// dp[h-1][hi].D = AFFINE2P_SCORE_MAX; -// dp[h-1][hi].I = AFFINE2P_SCORE_MAX; -// dp[h-1][hi].M = AFFINE2P_SCORE_MAX; -// } -// // Compute column -// for (v=lo;v<=hi;++v) { -// // Update DP.D -// const int del_new = dp[h][v-1].M + penalties->gap_opening + penalties->gap_extension; -// const int del_ext = dp[h][v-1].D + penalties->gap_extension; -// const int del = MIN(del_new,del_ext); -// dp[h][v].D = del; -// // Update DP.I -// const int ins_new = dp[h-1][v].M + penalties->gap_opening + penalties->gap_extension; -// const int ins_ext = dp[h-1][v].I + penalties->gap_extension; -// const int ins = MIN(ins_new,ins_ext); -// dp[h][v].I = ins; -// // Update DP.M -// const int m_match = dp[h-1][v-1].M + ((pattern[v-1]==text[h-1]) ? penalties->match : penalties->mismatch); -// dp[h][v].M = MIN(m_match,MIN(ins,del)); -// } -// } -// // Compute traceback -// swg_traceback(affine_matrix,penalties,cigar); -//} + diff --git a/tools/align_benchmark/gap_affine2p/affine2p_dp.h b/tools/align_benchmark/gap_affine2p/affine2p_dp.h index e1d144c1..1e96ea6b 100644 --- a/tools/align_benchmark/gap_affine2p/affine2p_dp.h +++ b/tools/align_benchmark/gap_affine2p/affine2p_dp.h @@ -40,7 +40,7 @@ /* * Gap-affine 2-piece alignment computation using dynamic-programming matrix */ -void affine2p_dp_compute( +void affine2p_dp_align( affine2p_matrix_t* const matrix, affine2p_penalties_t* const penalties, const char* const pattern, diff --git a/tools/align_benchmark/gap_linear/nw.c b/tools/align_benchmark/gap_linear/nw.c index 2346846d..521a0d47 100644 --- a/tools/align_benchmark/gap_linear/nw.c +++ b/tools/align_benchmark/gap_linear/nw.c @@ -56,7 +56,7 @@ void nw_traceback( --h; } else { operations[op_sentinel--] = - (dp[h][v] == dp[h-1][v-1]+penalties->mismatch) ? 'X' : 'M'; + (dp[h][v] == dp[h-1][v-1] + penalties->mismatch) ? 'X' : 'M'; --h; --v; } @@ -65,7 +65,7 @@ void nw_traceback( while (v>0) {operations[op_sentinel--] = 'D'; --v;} cigar->begin_offset = op_sentinel + 1; } -void nw_compute( +void nw_align( score_matrix_t* const score_matrix, linear_penalties_t* const penalties, const char* const pattern, @@ -87,7 +87,7 @@ void nw_compute( // Compute DP for (h=1;h<=text_length;++h) { for (v=1;v<=pattern_length;++v) { - int min = dp[h-1][v-1] + ((pattern[v-1]==text[h-1]) ? 0 : penalties->mismatch); // Misms + int min = dp[h-1][v-1] + ((pattern[v-1]==text[h-1]) ? penalties->match : penalties->mismatch); // Misms min = MIN(min,dp[h-1][v]+penalties->indel); // Ins min = MIN(min,dp[h][v-1]+penalties->indel); // Del dp[h][v] = min; diff --git a/tools/align_benchmark/gap_linear/nw.h b/tools/align_benchmark/gap_linear/nw.h index 2f54cc65..c138909b 100644 --- a/tools/align_benchmark/gap_linear/nw.h +++ b/tools/align_benchmark/gap_linear/nw.h @@ -37,9 +37,9 @@ #include "alignment/score_matrix.h" /* - * Edit distance computation using raw dynamic-programming matrix + * Alignment computation using raw dynamic-programming matrix */ -void nw_compute( +void nw_align( score_matrix_t* const score_matrix, linear_penalties_t* const penalties, const char* const pattern, diff --git a/wavefront/wavefront_align.c b/wavefront/wavefront_align.c index 6f92299e..6e5f41d3 100644 --- a/wavefront/wavefront_align.c +++ b/wavefront/wavefront_align.c @@ -251,10 +251,14 @@ void wavefront_align_terminate( const int score) { // Parameters const distance_metric_t distance_metric = wf_aligner->penalties.distance_metric; + const int pattern_length = wf_aligner->pattern_length; + const int text_length = wf_aligner->text_length; + const int swg_match_score = -(wf_aligner->penalties.match); // Retrieve alignment if (wf_aligner->alignment_scope == compute_score) { cigar_clear(&wf_aligner->cigar); - wf_aligner->cigar.score = (distance_metric <= edit) ? score : -score; + wf_aligner->cigar.score = (distance_metric <= edit) ? score : + WF_PENALTIES_GET_SW_SCORE(swg_match_score,pattern_length,text_length,score); } else { // Parameters wavefront_components_t* const wf_components = &wf_aligner->wf_components; @@ -283,7 +287,8 @@ void wavefront_align_terminate( } } // Set score & finish - wf_aligner->cigar.score = (distance_metric <= edit) ? score : -score; + wf_aligner->cigar.score = (distance_metric <= edit) ? score : + WF_PENALTIES_GET_SW_SCORE(swg_match_score,pattern_length,text_length,score); } } /* @@ -329,6 +334,9 @@ int wavefront_align_sequences( wf_aligner->align_status.status = WF_STATUS_SUCCESSFUL; return WF_STATUS_SUCCESSFUL; } +/* + * Wavefront Alignment Begin/End + */ void wavefront_align_sequences_init( wavefront_aligner_t* const wf_aligner, const char* const pattern, @@ -382,17 +390,27 @@ void wavefront_align_sequences_init( wavefront_plot(wf_aligner,pattern,text,0); } } +void wavefront_align_start( + wavefront_aligner_t* const wf_aligner) { + // DEBUG + wavefront_debug_prologue(wf_aligner); +} void wavefront_align_finish( wavefront_aligner_t* const wf_aligner) { + // Compute memory used + uint64_t memory_used = wavefront_aligner_get_size(wf_aligner); + wf_aligner->align_status.memory_used = memory_used; + // DEBUG + wavefront_debug_epilogue(wf_aligner); // Reap memory (controlled reaping) - uint64_t wf_memory_used = wavefront_aligner_get_size(wf_aligner); - if (wf_memory_used > wf_aligner->system.max_memory_resident) { + if (memory_used > wf_aligner->system.max_memory_resident) { // Wavefront components wavefront_components_reap(&wf_aligner->wf_components); // Check memory - wf_memory_used = wavefront_aligner_get_size(wf_aligner); + memory_used = wavefront_aligner_get_size(wf_aligner); + wf_aligner->align_status.memory_used = memory_used; // Slab - if (wf_memory_used > wf_aligner->system.max_memory_resident) { + if (memory_used > wf_aligner->system.max_memory_resident) { wavefront_slab_reap(wf_aligner->wavefront_slab); if (wf_aligner->aligner_forward != NULL) { wavefront_slab_reap(wf_aligner->aligner_forward->wavefront_slab); @@ -448,8 +466,8 @@ int wavefront_align( const int text_length) { // Parameters wavefront_align_status_t* const wf_align_status = &wf_aligner->align_status; - // DEBUG - wavefront_debug_prologue(wf_aligner); + // Start alignment + wavefront_align_start(wf_aligner); // Dispatcher if (wf_aligner->bidirectional_alignment) { wavefront_align_bidirectional(wf_aligner,pattern,pattern_length,text,text_length); @@ -462,8 +480,6 @@ int wavefront_align( } // Finish alignment wavefront_align_finish(wf_aligner); - // DEBUG - wavefront_debug_epilogue(wf_aligner,pattern,pattern_length,text,text_length); // Return return wf_align_status->status; } @@ -484,10 +500,6 @@ int wavefront_align_resume( } // Finish alignment wavefront_align_finish(wf_aligner); - // DEBUG - wavefront_debug_epilogue(wf_aligner, - wf_aligner->pattern,wf_aligner->pattern_length, - wf_aligner->text,wf_aligner->text_length); // Return return wf_align_status->status; } diff --git a/wavefront/wavefront_aligner.c b/wavefront/wavefront_aligner.c index c75dc0ab..0c0018de 100644 --- a/wavefront/wavefront_aligner.c +++ b/wavefront/wavefront_aligner.c @@ -79,20 +79,17 @@ void wavefront_aligner_init_penalties( case gap_linear: wavefronts_penalties_set_linear( &wf_aligner->penalties, - &attributes->linear_penalties, - wavefronts_penalties_shifted_penalties); + &attributes->linear_penalties); break; case gap_affine: wavefronts_penalties_set_affine( &wf_aligner->penalties, - &attributes->affine_penalties, - wavefronts_penalties_shifted_penalties); + &attributes->affine_penalties); break; case gap_affine_2p: wavefronts_penalties_set_affine2p( &wf_aligner->penalties, - &attributes->affine2p_penalties, - wavefronts_penalties_shifted_penalties); + &attributes->affine2p_penalties); break; } } diff --git a/wavefront/wavefront_aligner.h b/wavefront/wavefront_aligner.h index c1416be4..0dd80d7f 100644 --- a/wavefront/wavefront_aligner.h +++ b/wavefront/wavefront_aligner.h @@ -66,6 +66,7 @@ typedef struct { // Status int status; // Status code int score; // Current WF-alignment score + uint64_t memory_used; // Total memory used // Wavefront alignment functions void (*wf_align_compute)(wavefront_aligner_t* const,const int); // WF Compute function int (*wf_align_extend)(wavefront_aligner_t* const,const int); // WF Extend function diff --git a/wavefront/wavefront_bialign.c b/wavefront/wavefront_bialign.c index ddbcb616..d6d2a7ac 100644 --- a/wavefront/wavefront_bialign.c +++ b/wavefront/wavefront_bialign.c @@ -518,7 +518,10 @@ void wavefront_bialign( &form_1,breakpoint.component,component_end, breakpoint.score_reverse,cigar,rlevel+1); // Set score - cigar->score = -breakpoint.score; + const distance_metric_t distance_metric = wf_aligner->penalties.distance_metric; + const int swg_match_score = wf_aligner->penalties.match; + cigar->score = (distance_metric <= edit) ? breakpoint.score : + WF_PENALTIES_GET_SW_SCORE(swg_match_score,pattern_length,text_length,breakpoint.score); } diff --git a/wavefront/wavefront_debug.c b/wavefront/wavefront_debug.c index b1ec8ea6..613b1753 100644 --- a/wavefront/wavefront_debug.c +++ b/wavefront/wavefront_debug.c @@ -94,7 +94,7 @@ bool wavefront_check_alignment( ++pattern_pos; break; default: - fprintf(stderr,"[WFA::Check] Unknown edit operation '%c'\n",operations[i]); + fprintf(stream,"[WFA::Check] Unknown edit operation '%c'\n",operations[i]); exit(1); break; } @@ -120,20 +120,22 @@ bool wavefront_check_alignment( */ void wavefront_report_lite( FILE* const stream, - wavefront_aligner_t* const wf_aligner, - const char* const pattern, - const int pattern_length, - const char* const text, - const int text_length, - const int wf_status, - const uint64_t wf_memory_used) { + wavefront_aligner_t* const wf_aligner) { + // Parameters + const char* const pattern = wf_aligner->pattern; + const int pattern_length = wf_aligner->pattern_length; + const char* const text = wf_aligner->text; + const int text_length = wf_aligner->text_length; + const int status = wf_aligner->align_status.status; + const uint64_t memory_used = wf_aligner->align_status.memory_used; + // Banner fprintf(stream,"[WFA::Debug]"); // Sequences fprintf(stream,"\t%d",-wf_aligner->cigar.score); fprintf(stream,"\t%d\t%d",pattern_length,text_length); - fprintf(stream,"\t%s",(wf_status==0) ? "OK" : "FAIL"); + fprintf(stream,"\t%s",(status==0) ? "OK" : "FAIL"); fprintf(stream,"\t%2.3f",TIMER_GET_TOTAL_MS(&wf_aligner->system.timer)); - fprintf(stream,"\t%luMB\t",CONVERT_B_TO_MB(wf_memory_used)); + fprintf(stream,"\t%luMB\t",CONVERT_B_TO_MB(memory_used)); cigar_print(stream,&wf_aligner->cigar,true); if (wf_aligner->match_funct != NULL) { fprintf(stream,"\t-\t-"); @@ -142,15 +144,14 @@ void wavefront_report_lite( } fprintf(stream,"\n"); } -void wavefront_report_verbose( +void wavefront_report_verbose_begin( FILE* const stream, - wavefront_aligner_t* const wf_aligner, - const char* const pattern, - const int pattern_length, - const char* const text, - const int text_length, - const int wf_status, - const uint64_t wf_memory_used) { + wavefront_aligner_t* const wf_aligner) { + // Parameters + const char* const pattern = wf_aligner->pattern; + const int pattern_length = wf_aligner->pattern_length; + const char* const text = wf_aligner->text; + const int text_length = wf_aligner->text_length; // Input sequences fprintf(stream,"[WFA::Debug] WFA-Alignment (obj=%p)\n",wf_aligner); if (wf_aligner->match_funct != NULL) { @@ -188,13 +189,18 @@ void wavefront_report_verbose( CONVERT_B_TO_MB(wf_aligner->system.max_memory_compact), CONVERT_B_TO_MB(wf_aligner->system.max_memory_resident), CONVERT_B_TO_MB(wf_aligner->system.max_memory_abort)); +} +void wavefront_report_verbose_end( + FILE* const stream, + wavefront_aligner_t* const wf_aligner) { // Finish report - fprintf(stream,"[WFA::Debug]\tFinish.status\t%d\n",wf_status); + fprintf(stream,"[WFA::Debug]\tFinish.status\t%d\n",wf_aligner->align_status.status); fprintf(stream,"[WFA::Debug]\tTime.taken\t"); timer_print_total(stream,&wf_aligner->system.timer); fprintf(stream,"\n"); - fprintf(stream,"[WFA::Debug]\tMemory.used\t%luMB\n",CONVERT_B_TO_MB(wf_memory_used)); - fprintf(stream,"[WFA::Debug]\tWFA.score\t%d\n",-wf_aligner->cigar.score); + fprintf(stream,"[WFA::Debug]\tMemory.used\t%luMB\n", + CONVERT_B_TO_MB(wf_aligner->align_status.memory_used)); + fprintf(stream,"[WFA::Debug]\tWFA.score\t%d\n",-(wf_aligner->cigar.score)); fprintf(stream,"[WFA::Debug]\tWFA.cigar\t"); cigar_print(stream,&wf_aligner->cigar,true); fprintf(stream,"\n"); @@ -208,41 +214,32 @@ void wavefront_report_verbose( */ void wavefront_debug_prologue( wavefront_aligner_t* const wf_aligner) { + // Check verbose level if (wf_aligner->system.verbose >= 2) { timer_start(&wf_aligner->system.timer); + if (wf_aligner->system.verbose > 2) { + wavefront_report_verbose_begin(stderr,wf_aligner); + } } } void wavefront_debug_epilogue( - wavefront_aligner_t* const wf_aligner, - const char* const pattern, - const int pattern_length, - const char* const text, - const int text_length) { - // Parameters - const int wf_align_status = wf_aligner->align_status.status; - const uint64_t wf_memory_used = wavefront_aligner_get_size(wf_aligner); + wavefront_aligner_t* const wf_aligner) { // Print Summary if (wf_aligner->system.verbose >= 2) { timer_stop(&wf_aligner->system.timer); if (wf_aligner->system.verbose == 2) { - wavefront_report_lite(stderr,wf_aligner, - pattern,pattern_length,text,text_length, - wf_align_status,wf_memory_used); + wavefront_report_lite(stderr,wf_aligner); } else { - wavefront_report_verbose(stderr,wf_aligner, - pattern,pattern_length,text,text_length, - wf_align_status,wf_memory_used); + wavefront_report_verbose_end(stderr,wf_aligner); } } // Check correct if (wf_aligner->system.check_alignment_correct && - wf_align_status == WF_STATUS_SUCCESSFUL && + wf_aligner->align_status.status == WF_STATUS_SUCCESSFUL && wf_aligner->alignment_scope == compute_alignment) { if (!wavefront_check_alignment(stderr,wf_aligner)) { - fprintf(stderr,"[WFA::Check] Alignment incorrect\n"); - wavefront_report_verbose(stderr,wf_aligner, - pattern,pattern_length,text,text_length, - wf_align_status,wf_memory_used); + fprintf(stderr,"[WFA::Check] Alignment incorrect\n"); + wavefront_report_verbose_end(stderr,wf_aligner); exit(1); } } diff --git a/wavefront/wavefront_debug.h b/wavefront/wavefront_debug.h index 3aba068d..a72f6f9f 100644 --- a/wavefront/wavefront_debug.h +++ b/wavefront/wavefront_debug.h @@ -40,10 +40,6 @@ void wavefront_debug_prologue( wavefront_aligner_t* const wf_aligner); void wavefront_debug_epilogue( - wavefront_aligner_t* const wf_aligner, - const char* const pattern, - const int pattern_length, - const char* const text, - const int text_length); + wavefront_aligner_t* const wf_aligner); #endif /* WAVEFRONT_DEBUG_H_ */ diff --git a/wavefront/wavefront_penalties.c b/wavefront/wavefront_penalties.c index dc3395b2..d6068589 100644 --- a/wavefront/wavefront_penalties.c +++ b/wavefront/wavefront_penalties.c @@ -31,19 +31,6 @@ #include "wavefront_penalties.h" -/* - * Shift penalties - */ -void wavefronts_penalties_shift( - wavefronts_penalties_t* const wavefronts_penalties, - const int match) { - // Shift to zero match score - wavefronts_penalties->mismatch -= match; - wavefronts_penalties->gap_opening1 -= match; - wavefronts_penalties->gap_extension1 -= match; - wavefronts_penalties->gap_opening2 -= match; - wavefronts_penalties->gap_extension2 -= match; -} /* * Penalties adjustment */ @@ -71,29 +58,27 @@ void wavefronts_penalties_set_edit( } void wavefronts_penalties_set_linear( wavefronts_penalties_t* const wavefronts_penalties, - linear_penalties_t* const linear_penalties, - const wf_penalties_strategy_type penalties_strategy) { + linear_penalties_t* const linear_penalties) { // Set distance model wavefronts_penalties->distance_metric = gap_linear; // Check base penalties if (linear_penalties->match > 0) { fprintf(stderr,"[WFA::Penalties] Match score must be negative or zero (M=%d)\n",linear_penalties->match); exit(1); - } else if (linear_penalties->mismatch <= 0 || - linear_penalties->indel <= 0) { - fprintf(stderr,"[WFA::Penalties] Penalties must be strictly positive (X=%d,O=%d). " - "Must be (X>0,D>0,I>0)\n", - linear_penalties->mismatch, - linear_penalties->indel); + } else if (linear_penalties->mismatch <= 0 || linear_penalties->indel <= 0) { + fprintf(stderr,"[WFA::Penalties] Penalties (X=%d,D=%d,I=%d) must be (X>0,D>0,I>0)\n", + linear_penalties->mismatch,linear_penalties->indel,linear_penalties->indel); exit(1); } - // Copy base penalties - wavefronts_penalties->mismatch = linear_penalties->mismatch; - wavefronts_penalties->gap_opening1 = linear_penalties->indel; - // Adjust scores - if (linear_penalties->match > 0 && - penalties_strategy == wavefronts_penalties_shifted_penalties) { - wavefronts_penalties_shift(wavefronts_penalties,linear_penalties->match); + // Set penalties (if needed, adjust using Eizenga's formula) + if (linear_penalties->match < 0) { + wavefronts_penalties->match = linear_penalties->match; + wavefronts_penalties->mismatch = 2*linear_penalties->mismatch - 2*linear_penalties->match; + wavefronts_penalties->gap_opening1 = 2*linear_penalties->indel - linear_penalties->match; + } else { + wavefronts_penalties->match = 0; + wavefronts_penalties->mismatch = linear_penalties->mismatch; + wavefronts_penalties->gap_opening1 = linear_penalties->indel; } // Set unused wavefronts_penalties->gap_extension1 = -1; @@ -102,8 +87,7 @@ void wavefronts_penalties_set_linear( } void wavefronts_penalties_set_affine( wavefronts_penalties_t* const wavefronts_penalties, - affine_penalties_t* const affine_penalties, - const wf_penalties_strategy_type penalties_strategy) { + affine_penalties_t* const affine_penalties) { // Set distance model wavefronts_penalties->distance_metric = gap_affine; // Check base penalties @@ -113,21 +97,23 @@ void wavefronts_penalties_set_affine( } else if (affine_penalties->mismatch <= 0 || affine_penalties->gap_opening < 0 || affine_penalties->gap_extension <= 0) { - fprintf(stderr,"[WFA::Penalties] Penalties must be positive (X=%d,O=%d,E=%d). " - "Must be (X>0,O>=0,E>0)\n", + fprintf(stderr,"[WFA::Penalties] Penalties (X=%d,O=%d,E=%d) must be (X>0,O>=0,E>0)\n", affine_penalties->mismatch, affine_penalties->gap_opening, affine_penalties->gap_extension); exit(1); } - // Copy base penalties - wavefronts_penalties->mismatch = affine_penalties->mismatch; - wavefronts_penalties->gap_opening1 = affine_penalties->gap_opening; - wavefronts_penalties->gap_extension1 = affine_penalties->gap_extension; - // Adjust scores - if (affine_penalties->match > 0 && - penalties_strategy == wavefronts_penalties_shifted_penalties) { - wavefronts_penalties_shift(wavefronts_penalties,affine_penalties->match); + // Set penalties (if needed, adjust using Eizenga's formula) + if (affine_penalties->match < 0) { + wavefronts_penalties->match = affine_penalties->match; + wavefronts_penalties->mismatch = 2*affine_penalties->mismatch - 2*affine_penalties->match; + wavefronts_penalties->gap_opening1 = 2*affine_penalties->gap_opening; + wavefronts_penalties->gap_extension1 = 2*affine_penalties->gap_extension - affine_penalties->match; + } else { + wavefronts_penalties->match = 0; + wavefronts_penalties->mismatch = affine_penalties->mismatch; + wavefronts_penalties->gap_opening1 = affine_penalties->gap_opening; + wavefronts_penalties->gap_extension1 = affine_penalties->gap_extension; } // Set unused wavefronts_penalties->gap_opening2 = -1; @@ -135,8 +121,7 @@ void wavefronts_penalties_set_affine( } void wavefronts_penalties_set_affine2p( wavefronts_penalties_t* const wavefronts_penalties, - affine2p_penalties_t* const affine2p_penalties, - const wf_penalties_strategy_type penalties_strategy) { + affine2p_penalties_t* const affine2p_penalties) { // Set distance model wavefronts_penalties->distance_metric = gap_affine_2p; // Check base penalties @@ -148,8 +133,7 @@ void wavefronts_penalties_set_affine2p( affine2p_penalties->gap_extension1 <= 0 || affine2p_penalties->gap_opening2 <= 0 || affine2p_penalties->gap_extension2 <= 0) { - fprintf(stderr,"[WFA::Penalties] Penalties must be strictly positive (X=%d,O1=%d,E1=%d,O2=%d,E2=%d). " - "Must be (X>0,O1>0,E1>0,O1>0,E1>0)\n", + fprintf(stderr,"[WFA::Penalties] Penalties (X=%d,O1=%d,E1=%d,O2=%d,E2=%d) must be (X>0,O1>=0,E1>0,O1>=0,E1>0)\n", affine2p_penalties->mismatch, affine2p_penalties->gap_opening1, affine2p_penalties->gap_extension1, @@ -157,16 +141,21 @@ void wavefronts_penalties_set_affine2p( affine2p_penalties->gap_extension2); exit(1); } - // Copy base penalties - wavefronts_penalties->mismatch = affine2p_penalties->mismatch; - wavefronts_penalties->gap_opening1 = affine2p_penalties->gap_opening1; - wavefronts_penalties->gap_extension1 = affine2p_penalties->gap_extension1; - wavefronts_penalties->gap_opening2 = affine2p_penalties->gap_opening2; - wavefronts_penalties->gap_extension2 = affine2p_penalties->gap_extension2; - // Adjust scores - if (affine2p_penalties->match > 0 && - penalties_strategy == wavefronts_penalties_shifted_penalties) { - wavefronts_penalties_shift(wavefronts_penalties,affine2p_penalties->match); + // Set penalties (if needed, adjust using Eizenga's formula) + if (affine2p_penalties->match < 0) { + wavefronts_penalties->match = affine2p_penalties->match; + wavefronts_penalties->mismatch = 2*affine2p_penalties->mismatch - 2*wavefronts_penalties->match; + wavefronts_penalties->gap_opening1 = 2*affine2p_penalties->gap_opening1; + wavefronts_penalties->gap_extension1 = 2*affine2p_penalties->gap_extension1 - affine2p_penalties->match; + wavefronts_penalties->gap_opening2 = 2*affine2p_penalties->gap_opening2; + wavefronts_penalties->gap_extension2 = 2*affine2p_penalties->gap_extension2 - affine2p_penalties->match; + } else { + wavefronts_penalties->match = 0; + wavefronts_penalties->mismatch = affine2p_penalties->mismatch; + wavefronts_penalties->gap_opening1 = affine2p_penalties->gap_opening1; + wavefronts_penalties->gap_extension1 = affine2p_penalties->gap_extension1; + wavefronts_penalties->gap_opening2 = affine2p_penalties->gap_opening2; + wavefronts_penalties->gap_extension2 = affine2p_penalties->gap_extension2; } } /* @@ -208,6 +197,3 @@ void wavefronts_penalties_print( } - - - diff --git a/wavefront/wavefront_penalties.h b/wavefront/wavefront_penalties.h index 83922861..bd6a1a5d 100644 --- a/wavefront/wavefront_penalties.h +++ b/wavefront/wavefront_penalties.h @@ -44,30 +44,28 @@ typedef enum { edit = 1, // Levenshtein gap_linear = 2, // Needleman-Wunsch gap_affine = 3, // Smith-Waterman-Gotoh - gap_affine_2p = 4 // Concave 2-pieces + gap_affine_2p = 4 // Gap-Affine 2-pieces } distance_metric_t; -/* - * Penalty adaptation strategy - */ -typedef enum { - wavefronts_penalties_force_zero_match, - wavefronts_penalties_shifted_penalties, -} wf_penalties_strategy_type; - /* * Wavefront Penalties */ typedef struct { distance_metric_t distance_metric; // Alignment metric/distance used - // int match; // (M = 0) + int match; // (M <= 0) (Internal variable change to M=0 for WFA) int mismatch; // (X > 0) - int gap_opening1; // (O1 > 0) + int gap_opening1; // (O1 >= 0) int gap_extension1; // (E1 > 0) - int gap_opening2; // (O2 > 0) + int gap_opening2; // (O2 >= 0) int gap_extension2; // (E2 > 0) } wavefronts_penalties_t; +/* + * Compute SW-score equivalent (thanks to Eizenga's formula) + */ +#define WF_PENALTIES_GET_SW_SCORE(swg_match_score,plen,tlen,wf_score) \ + (swg_match_score*(plen+tlen))/2 - wf_score + /* * Penalties adjustment */ @@ -77,16 +75,32 @@ void wavefronts_penalties_set_edit( wavefronts_penalties_t* const wavefronts_penalties); void wavefronts_penalties_set_linear( wavefronts_penalties_t* const wavefronts_penalties, - linear_penalties_t* const linear_penalties, - const wf_penalties_strategy_type penalties_strategy); + linear_penalties_t* const linear_penalties); void wavefronts_penalties_set_affine( wavefronts_penalties_t* const wavefronts_penalties, - affine_penalties_t* const affine_penalties, - const wf_penalties_strategy_type penalties_strategy); + affine_penalties_t* const affine_penalties); void wavefronts_penalties_set_affine2p( wavefronts_penalties_t* const wavefronts_penalties, - affine2p_penalties_t* const affine2p_penalties, - const wf_penalties_strategy_type penalties_strategy); + affine2p_penalties_t* const affine2p_penalties); + +/* + * Score conversion + */ +int wavefronts_penalties_get_score_indel( + wavefronts_penalties_t* const wavefronts_penalties, + const int score); +int wavefronts_penalties_get_score_edit( + wavefronts_penalties_t* const wavefronts_penalties, + const int score); +int wavefronts_penalties_get_score_linear( + wavefronts_penalties_t* const wavefronts_penalties, + const int score); +int wavefronts_penalties_get_score_affine( + wavefronts_penalties_t* const wavefronts_penalties, + const int score); +int wavefronts_penalties_get_score_affine2p( + wavefronts_penalties_t* const wavefronts_penalties, + const int score); /* * Display