Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Alignment] Is there a way to run align_pairwise without traceback? #3045

Open
leannmlindsey opened this issue Aug 12, 2022 · 3 comments
Open
Labels
question a user question how to do certain things

Comments

@leannmlindsey
Copy link

Platform

  • SeqAn version:
  • Operating system:
  • Compiler:

Question

In earlier versions of SeqAn there was a function called localAlignmentScore (https://docs.seqan.de/seqan/master/?p=locallAlignmentScore) which did not run the quadratic time alignment step. I have not been able to find the same function in SeqAn3. Is there a different way to do this in SeqAn3?

@leannmlindsey leannmlindsey added the question a user question how to do certain things label Aug 12, 2022
@marehr
Copy link
Member

marehr commented Aug 16, 2022

Hi @leannmlindsey,

sorry for the delay! It is summer vacation time :)

We have a really detailed page (https://docs.seqan.de/seqan/3.2.0/group__alignment__pairwise.html) that describes all the possible combinations that you can do with the pairwise alignment module.

For example, in the section "Accessing the alignment results" we have the following snippet that describes how to limit the output that should be computed (in this example for the edit distance):

#include <vector>
 
#include <seqan3/alignment/configuration/align_config_edit.hpp>
#include <seqan3/alignment/pairwise/align_pairwise.hpp>
#include <seqan3/alphabet/nucleotide/dna4.hpp>
#include <seqan3/core/debug_stream.hpp>
 
int main()
{
    using namespace seqan3::literals;
 
    // Basic alignment algorithm configuration.
    auto config = seqan3::align_cfg::method_global{} | seqan3::align_cfg::edit_scheme;
 
    std::pair p{"ACGTAGC"_dna4, "AGTACGACG"_dna4};
 
    // Compute only the score:
    for (auto res : seqan3::align_pairwise(p, config | seqan3::align_cfg::output_score{}))
        seqan3::debug_stream << res << "\n"; // prints: {score: -4}
 
    // Compute only the alignment:
    for (auto res : seqan3::align_pairwise(p, config | seqan3::align_cfg::output_alignment{}))
        seqan3::debug_stream << res << "\n"; // prints: {alignment: (ACGTA-G-C-,A-GTACGACG)}
 
    // Compute the score and the alignment:
    for (auto res :
         seqan3::align_pairwise(p, config | seqan3::align_cfg::output_score{} | seqan3::align_cfg::output_alignment{}))
        seqan3::debug_stream << res << "\n"; // prints: {score: -4, alignment: (ACGTA-G-C-,A-GTACGACG)}
 
    // By default compute everything:
    for (auto res : seqan3::align_pairwise(p, config))
        seqan3::debug_stream
            << res << "\n"; // prints {id: 0, score: -4, begin: (0,0), end: (7,9) alignment: (ACGTA-G-C-,A-GTACGACG)}
}

As you can see, we can "combine" different traits by piping them together.

Other examples:

// global pairwise alignment with edit distance (i.e. match = 0, mismatch = -1, gaps = -1) (uses a specialized algorithm https://doi.org/10.1145/316542.316550)
// and compute only the score (needs only linear space)
auto config1 =
    seqan3::align_cfg::method_global{} |
    seqan3::align_cfg::edit_scheme |
    seqan3::align_cfg::output_score{};

// banded-local pairwise-alignment with custom affine-scoring scheme (match = +3, mismatch = -2, gap_open = -5, gap_extension = -2) (needs linear time)
// and only compute the score. (needs linear space)
auto config2 =
    seqan3::align_cfg::method_local{} |
    seqan3::align_cfg::scoring_scheme
    {
        seqan3::nucleotide_scoring_scheme
        {
            seqan3::match_score{3},
            seqan3::mismatch_score{-2}
        }
    } |
    seqan3::align_cfg::gap_cost_affine
    {
        seqan3::align_cfg::open_score{-5},
        seqan3::align_cfg::extension_score{-2}
    } |
    seqan3::align_cfg::band_fixed_size
    {
       seqan3::align_cfg::lower_diagonal{-4},
       seqan3::align_cfg::upper_diagonal{4}
    } |
    seqan3::align_cfg::output_score{}
    // would compute the alignment, but would need more space to store it.
    // | seqan3::align_cfg::output_alignment{}
   ;

I hope this helps answering your question. As you are a new user, maybe you can help us improve our documentation. Where did you try to find this information and what could we improve from your POV?

@leannmlindsey
Copy link
Author

leannmlindsey commented Aug 16, 2022 via email

@eseiler eseiler changed the title Is there a way to run align_pairwise without traceback? [Alignment] Is there a way to run align_pairwise without traceback? Oct 20, 2022
@rrahn
Copy link
Contributor

rrahn commented Nov 17, 2022

Hi @leannmlindsey, thanks for writing us.
I think I can help you but I'd like to ask you more questions about your request in order to fully understand your request.

You want to use the Smith-Waterman algorithm to compute a pure local alignment where the optimal alignment aligns any infix of the source with any infix of the target sequence, such as finding a conserved region between two protein sequences?
You are not referring to a semi-global alignment where instead the optimal alignment aligns any infix of the source sequence to the entire target sequence, such as in read mapping applications where a potential read mapping location is verified?
I'am asking because I've seen many ambiguous descriptions of the latter problem using the same terminology as for the local alignment.

You asked for a Smith-Waterman variant using reverse scoring? Can you pease elaborate what you mean with reverse scoring in this case?
Is there a particular DP implementation you have in mind?
Because you also referred to some implementation in SeqAn that did not use the quadratic alignment step.
But as far as I know this didn't exist in SeqAn for the local alignment -- only for the global and semi-global alignments.

So the function localAlignmentScore computes only the score, but uses the regular quadratic runtime DP algorithm as described in the original publication of Smith and Waterman. It does not compute the traceback though, and thus is implemented in linear space.
But you were explicitly referring to runtime, correct?
The same happens in SeqAn3. If you want to compute only the score you can specify the output configuration seqan3::align_cfg::output_score{}, which will omit the allocation of a quadratic trace matrix internally.
By specifying seqan3::align_cfg::output_score{} | seqan3::align_cfg::output_alignment you additionally get the final alignment, which requires the allocation of the quadratic trace matrix.
Note the configuration in SeqAn3 chooses automatically the best implementation available to compute the requested alignment features.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question a user question how to do certain things
Projects
None yet
Development

No branches or pull requests

3 participants