-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug in calculating all_by_all_pairwise_similarity
#800
Conversation
all_by_all_pairwise_similarity
OK, so what was happening was that the provided nodes didn't have ancestors (at least not in the provided set), so their metrics were set to zero. Should this produce an error and/or should there be an attempt (in oaklib or semsimian) to identify the potential ancestor? It's reasonable to expect that the user should only need to provide IC values for nodes of interest, even if the ancestors are required to calculate pairwise semsim. |
Could the 3 of us do a code review on this, maybe tomorrow if we are all around?
If I understand the code correctly, we check here if the two terms have a common ancestor (which we call I think the remaining question is why so many (all?) of Vinicius's pairs of ontology terms didn't have a common ancestor? This seems like a bug somewhere, either in OAK, semsimian, or Phenio. |
They certainly have common ancestors, and in PHENIO the common ancestor of an HP term and an MP term will almost always be a UPHENO term. But somehow if the UPHENO term isn't included in the provided IC map it doesn't get included as an ancestor, even if it's present in the source ontology - perhaps because we don't mix user-provided IC maps with calculated ones. |
When
len(ancestor_set) == 0
the jacquard and AIC scores are explicitly set to 0 which is incorrect.Thank you @souzadevinicius for highlighting this!
Also updated
semsimian
version dependency to latestfixes monarch-initiative/semsimian#133 (comment)