-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
several same-score pairwise alignments #3198
Comments
Hey @notestaff, if there are multiple valid alignments, the traceback will choose in this order:
The alignments are always evaluated lazily, i.e., they are only computed if you access them. As far as I am aware, there is no way to force an output of all possible (alternative) alignments. Click to show example#include <vector>
#include <seqan3/alignment/aligned_sequence/debug_stream_alignment.hpp>
#include <seqan3/alignment/configuration/align_config_edit.hpp>
#include <seqan3/alignment/pairwise/align_pairwise.hpp>
#include <seqan3/alphabet/nucleotide/dna4.hpp>
#include <seqan3/core/debug_stream.hpp>
int main()
{
using namespace seqan3::literals;
std::vector<seqan3::dna4> const sequence_1{"ACGT"_dna4};
std::vector<seqan3::dna4> const sequence_2{"AGCT"_dna4};
auto config = seqan3::align_cfg::method_global{} | seqan3::align_cfg::edit_scheme;
for (auto const & result : seqan3::align_pairwise(std::tie(sequence_1, sequence_2), config))
seqan3::debug_stream << result << '\n';
} Will output:
An alternative is:
but the algorithm will prefer putting the gap in the first sequence first. @rrahn will also have a look at your questions and correct me/expand on my answers if necessary :) What would the use case be for iterating over all alternative alignments? Please don't hesitate to ask more questions if something is unclear. |
Thanks for the response (and for seqan generally). |
Hi @notestaff, thanks for writing us. #include <seqan3/alignment/configuration/align_config_debug.hpp>
#include <seqan3/alignment/configuration/align_config_gap_cost_affine.hpp>
#include <seqan3/alignment/configuration/align_config_method.hpp>
#include <seqan3/alignment/configuration/align_config_scoring_scheme.hpp>
#include <seqan3/alignment/matrix/detail/debug_matrix.hpp>
#include <seqan3/alignment/matrix/detail/trace_directions.hpp>
#include <seqan3/alignment/pairwise/align_pairwise.hpp>
#include <seqan3/alignment/scoring/nucleotide_scoring_scheme.hpp>
#include <seqan3/alphabet/nucleotide/dna4.hpp>
auto align_config = seqan3::align_cfg::method_global{} |
seqan3::align_cfg::scoring_scheme{
seqan3::nucleotide_scoring_scheme{seqan3::match_score{4},
seqan3::mismatch_score{-5}}
} |
seqan3::align_cfg::gap_cost_affine{seqan3::align_cfg::open_score{-10},
seqan3::align_cfg::extension_score{-1}
} |
seqan3::align_cfg::detail::debug{};
auto source = "AACCGGTTAACCGGTT"_dna4;
auto target = "ACGTACGTA"_dna4;
using trace_matrix_t = seqan3::detail::two_dimensional_matrix<std::optional<seqan3::detail::trace_directions>>;
auto alignments = seqan3::align_pairwise(std::tie(source, target), align_config);
for (auto && alignment : alignments) {
trace_matrix_t trace_matrix{alignment.trace_matrix()};
// in case you are using local alignments you can use
// alignment.sequence1_end_position()/alignment.sequence2_end_position() to find the column and row index where the alignment starts in the trace matrix and
// alignment.sequence1_begin_position()/alignment.sequence2_begin_position() to find the column and respectively row index where the alignment ends in the trace matrix.
// you can then implement a traversal over the trace matrix to enumerate all paths between the end and begin point.
} Note that this soulution, however, is not implemented with efficiency in mind but thought as a debug feature to evaluate the correctness of the computed alignments. If anything is unclear or does not work as expected, just contact us. |
Thanks for that suggestion. I am trying to extract all cooptimal alignments for local alignments from the debug trace matrix. Here, I try to utilize the auto tracePath(const seqan3::detail::matrix_coordinate &trace_begin,
const seqan3::detail::two_dimensional_matrix<seqan3::detail::trace_directions>
&complete_matrix) {
using matrix_t = seqan3::detail::two_dimensional_matrix<seqan3::detail::trace_directions>;
using matrix_iter_t = std::ranges::iterator_t<matrix_t const>;
using trace_iterator_t = seqan3::detail::trace_iterator<matrix_iter_t>;
using path_t = std::ranges::subrange<trace_iterator_t, std::default_sentinel_t>;
seqan3::detail::matrix_offset const offset{trace_begin};
return path_t{trace_iterator_t{complete_matrix.begin() + offset}, std::default_sentinel};
};
template <typename sequence_pair_t, typename score_t, typename matrix_coordinate_t>
AlignmentResult makeResult(const sequence_pair_t &sequencePair, score_t score,
matrix_coordinate_t endPositions, auto const &alignmentMatrix) {
const size_t elementsN = alignmentMatrix.rows() * alignmentMatrix.cols();
std::vector<seqan3::detail::trace_directions> traceDirections;
traceDirections.reserve(elementsN);
for (const auto &direction : alignmentMatrix) {
traceDirections.push_back(direction.value_or(seqan3::detail::trace_directions::none));
}
seqan3::detail::number_rows rowsN{alignmentMatrix.rows()};
seqan3::detail::number_cols colsN{alignmentMatrix.cols()};
seqan3::detail::two_dimensional_matrix<seqan3::detail::trace_directions> traceMatrix(
rowsN, colsN, traceDirections);
using std::get;
seqan3::detail::aligned_sequence_builder builder{get<0>(sequencePair),
get<1>(sequencePair)};
auto trace_result = builder(CoOptimalPairwiseAligner::tracePath(endPositions, traceMatrix));
return {score,
std::make_pair(trace_result.first_sequence_slice_positions.first,
trace_result.second_sequence_slice_positions.first),
std::make_pair(endPositions.row, endPositions.col)};
}; This works sometimes, but at some point I got the error: After digging, I discovered that the internal trace matrix is modified before being returned as a debug matrix. This seems to cause the issue. After removing these lines, it worked fine: seqan3/include/seqan3/alignment/pairwise/alignment_algorithm.hpp Lines 740 to 763 in bcc142e
So my questions are twofold: is there any way to obtain the unmodified trace matrix without modifying your source code, and are there any plans for outputting the cooptimal alignments in the future? Thanks again! |
Hi @ChristopherAdelmann thanks for digging into this. |
Hi @rrahn, of course, here is some sample code that reproduces the issue. #include <seqan3/alphabet/nucleotide/dna5.hpp>
#include <seqan3/alignment/configuration/all.hpp>
#include <seqan3/alignment/matrix/detail/aligned_sequence_builder.hpp>
#include <seqan3/alignment/matrix/detail/debug_matrix.hpp>
#include <seqan3/alignment/matrix/detail/matrix_coordinate.hpp>
#include <seqan3/alignment/matrix/detail/trace_directions.hpp>
#include <seqan3/alignment/pairwise/align_pairwise.hpp>
#include <seqan3/alignment/scoring/nucleotide_scoring_scheme.hpp>
void testCooptimalAlignments() {
using namespace seqan3::literals;
auto source = "GTTCGCTAGTTTCGCGCGTAGTTTCTCGTTCG"_dna5;
auto target = "CTTCTACTTTGTTCACTTTGATAGCAAACTTAGCCGCTTTA"_dna5;
seqan3::nucleotide_scoring_scheme scheme{seqan3::match_score{1},
seqan3::mismatch_score{-1}};
scheme.score('A'_dna5, 'T'_dna5) = 1;
scheme.score('T'_dna5, 'A'_dna5) = 1;
scheme.score('G'_dna5, 'C'_dna5) = 1;
scheme.score('C'_dna5, 'G'_dna5) = 1;
scheme.score('G'_dna5, 'T'_dna5) = 1;
scheme.score('T'_dna5, 'G'_dna5) = 1;
scheme.score('T'_dna5, 'T'_dna5) = -1;
scheme.score('A'_dna5, 'A'_dna5) = -1;
scheme.score('C'_dna5, 'C'_dna5) = -1;
scheme.score('G'_dna5, 'G'_dna5) = -1;
scheme.score('A'_dna5, 'G'_dna5) = -1;
scheme.score('A'_dna5, 'C'_dna5) = -1;
scheme.score('G'_dna5, 'A'_dna5) = -1;
scheme.score('C'_dna5, 'A'_dna5) = -1;
scheme.score('T'_dna5, 'C'_dna5) = -1;
scheme.score('C'_dna5, 'T'_dna5) = -1;
auto alignConfig =
seqan3::align_cfg::method_local{} | seqan3::align_cfg::scoring_scheme{scheme} |
seqan3::align_cfg::gap_cost_affine{seqan3::align_cfg::open_score{-2},
seqan3::align_cfg::extension_score{-2}} |
seqan3::align_cfg::output_score{} | seqan3::align_cfg::output_alignment{} |
seqan3::align_cfg::detail::debug{};
auto alignments = seqan3::align_pairwise(std::tie(source, target), alignConfig);
for (auto &&alignment : alignments) {
const int score = alignment.score();
using TraceMatrix = seqan3::detail::two_dimensional_matrix<
std::optional<seqan3::detail::trace_directions>>;
using ScoreMatrix = seqan3::detail::row_wise_matrix<std::optional<int>>;
const TraceMatrix traceMatrixOptional{alignment.trace_matrix()};
const ScoreMatrix scoreMatrix{alignment.score_matrix()};
seqan3::debug_stream << "Score: " << score << std::endl;
seqan3::debug_stream << scoreMatrix << std::endl;
auto it = std::ranges::find(scoreMatrix, score);
while (it != scoreMatrix.end()) {
size_t row = std::distance(scoreMatrix.begin(), it) / scoreMatrix.cols();
size_t col = std::distance(scoreMatrix.begin(), it) % scoreMatrix.cols();
seqan3::detail::matrix_coordinate traceBegin{
seqan3::detail::row_index_type{row}, seqan3::detail::column_index_type{col}};
std::vector<seqan3::detail::trace_directions> traceDirections;
std::transform(
traceMatrixOptional.begin(), traceMatrixOptional.end(),
std::back_inserter(traceDirections), [](const auto &direction) {
return direction.value_or(seqan3::detail::trace_directions::none);
});
seqan3::detail::number_rows rowsN{traceMatrixOptional.rows()};
seqan3::detail::number_cols colsN{traceMatrixOptional.cols()};
seqan3::detail::two_dimensional_matrix<seqan3::detail::trace_directions>
traceMatrix(rowsN, colsN, traceDirections);
seqan3::detail::aligned_sequence_builder builder{source, target};
using matrix_t =
seqan3::detail::two_dimensional_matrix<seqan3::detail::trace_directions>;
using matrix_iter_t = std::ranges::iterator_t<matrix_t const>;
using trace_iterator_t = seqan3::detail::trace_iterator<matrix_iter_t>;
using path_t = std::ranges::subrange<trace_iterator_t, std::default_sentinel_t>;
seqan3::detail::matrix_offset const offset{traceBegin};
auto cooptimalTracePath =
path_t{trace_iterator_t{traceMatrix.begin() + offset}, std::default_sentinel};
seqan3::debug_stream << "Trace path: " << cooptimalTracePath << std::endl;
it = std::ranges::find(it + 1, scoreMatrix.end(), score);
}
}
}; |
Platform
Question
In pairwise_align(), when several possible alignments have the same score, how does the code pick which one to return? Can it be asked to return all of them (lazily), like BioPython's Bio.Align.PairwiseAligner does? Thanks!
@eseiler
The text was updated successfully, but these errors were encountered: