Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use sequence similarity contribution in the MC score #73

Open
lafita opened this issue Nov 10, 2015 · 2 comments
Open

Use sequence similarity contribution in the MC score #73

lafita opened this issue Nov 10, 2015 · 2 comments
Milestone

Comments

@lafita
Copy link
Collaborator

lafita commented Nov 10, 2015

If the sequence similarity option is activated in the CE self-alignment, the sequence contribution should be conserved in the optimization step (otherwise the effect of the option can be lost).

@lafita
Copy link
Collaborator Author

lafita commented Nov 25, 2015

For now, we are postponing this issue because there is not a clear use case and there are other more important feature requests (with higher priority). In the discussion about it, we concluded:

  1. The structure and sequence scores need to be combined with a parameter that the user can choose.
  2. Let's call this parameter lambda, which ranges from 0 to 1 and where 0 means only the structure score is considered and 1 means that only the sequence score is considered. Values in between define the ratio of importance between structure and sequence scores, so that lambda equal to 0.5 means that both scores have equal importance.
  3. For the latter to apply, the scores need to be normalized by the maximum score of an aligned position, which is the parameter C of structural similarity function and the maximum value of the SubstitutionMatrix for the sequence score. (Note that the CECalculator does not combine the scores in this way, but rather sums a constant times the sequence score directly to the structure score, so to be consistent we should change that function too).
  4. The function would look like:
  Score = (Sstr / MaxSstr) * (1 - lambda) + (Sseq / MaxSseq) * lambda

In order to implement this feature, the following needs to be done:

  1. Implement a method in MultipleAlignmentTools that converts a MultipleAlignment to a MultipleSequenceAlignment. A template exists now.
  2. Add a method to MultipleAlignmentScorer that calculates the sequence alignment score of a MultipleAlignment (possibly converting it to a MSA and using the biojava scoring functions in alignment module).
  3. Add a new parameter for lambda in CeSymmParameters. When computing the score of a MultipleAlignment in McOptimizer, use the appropiate ratio and normalization as described here.

@sbliven
Copy link
Collaborator

sbliven commented Nov 25, 2015

Nice summary, @lafita. I do think that the normalization factors could be worked out by looking at the distributions of scores over a large set of pairwise comparisons. This might be a nice student project.

@sbliven sbliven added this to the CeSymm-3.0 milestone Jun 4, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants