-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ignoring duplicate exact synonyms that are acronyms in robot report #1175
Comments
@allenbaron I will help pushing this through. Do you know SPARQL? Could you try to redesign this query to achieve this goal: https://github.com/ontodev/robot/blob/master/robot-core/src/main/resources/report_queries/duplicate_exact_synonym.rq If you have trouble with this you can ping @anitacaron (on slack also) who may have a soft spot for someone with QC related SPARQL problems :) |
The one caveat I want to say: if we do this, we have to use |
Isn't it another exception for the label-synonym-polysemy-violation? There's already an exception for abbreviation (OMO:0003000) |
Yes, acronym (OMO:0003012) is a new synonym type that would also be an exception. Honestly, the query at UBERON linked by @anitacaron (with minor modification) is probably the best bet for updating the duplicate_exact_synonym.rq query in ROBOT. Using a subquery only slows things down a bit compared to the current query but it's definitely simpler and probably faster for managing exceptions. I think the only changes to it would be:
I know @jamesaoverton is particularly concerned with ROBOT's backward compatibility, which I appreciate. Would these changes be a concern in that regard? |
I decided to look more closely at execution time differences using doid-edit.owl and uberon.owl (because I had it on hand, not the edit file). Just switching to the subquery approach without adding in the exclusion of synonym types or using UCASE takes about 1.07-1.43 times longer (DO: current = 6.13s, subquery = 6.57s; UBERON: current = 17.8s, subquery = 25.4s). Adding in the exclusion and UCASE slows things down further by ~ 2s for either DO & UBERON. |
@allenbaron thanks for the analysis!
I personally think we should introduce this now - I cannot imagine a single case where the duplicate synonym check should be case sensitive.. Of, course, this needs to be well documented!
This is much more complicated, as you would want to
Not sure how this should be solved! Do you want to make a PR and see how it goes? |
As an alternative to creating an exclusion for abbreviations and acronyms, could we introduce a new synonym predicate, something like I guess a new synonym predicate probably has more cons than pros. If we were really going to do something like this, we probably should've just made abbreviations and acronyms their own synonym predicates instead of making them synonym types. I'll work to open a PR for updating the SPARQL query soon. |
I don't think we should use that system for acronyms, which are "exact" synonyms, but now that you say this - it seems super weird to me that there are no close synonyms! I never noticed that! Wow!
Thanks!!! |
Given information-artifact-ontology/ontology-metadata#135, is the plan now for
robot report
to exclude from warnings duplicate exact synonyms that are annotated as acronyms? Overlapping acronyms are fairly common.This is a follow-up to the slightly tangential comment made in #748 (comment) by dosumis.
The text was updated successfully, but these errors were encountered: