Ignoring duplicate exact synonyms that are acronyms in robot report #1175

allenbaron · 2024-01-08T17:19:52Z

Given information-artifact-ontology/ontology-metadata#135, is the plan now for robot report to exclude from warnings duplicate exact synonyms that are annotated as acronyms? Overlapping acronyms are fairly common.

This is a follow-up to the slightly tangential comment made in #748 (comment) by dosumis.

Slightly tangential, but we really need a way to mark synonyms as allowable duplicate with labels (maybe using synonym type?). We have many cases in FBbt where the same acronym is used in the literature for multiple distinct anatomical structures (pretty common in anatomy). We add these are synonyms with a reference to back them up. This is frequently useful to anyone looking to find a term based on what they find in the literature - curators and users. I guess the rule originally comes from GO where this is less of an issue with names for processes/MFs?

The text was updated successfully, but these errors were encountered:

matentzn · 2024-01-11T18:06:07Z

@allenbaron I will help pushing this through. Do you know SPARQL? Could you try to redesign this query to achieve this goal: https://github.com/ontodev/robot/blob/master/robot-core/src/main/resources/report_queries/duplicate_exact_synonym.rq

If you have trouble with this you can ping @anitacaron (on slack also) who may have a soft spot for someone with QC related SPARQL problems :)

matentzn · 2024-01-11T18:07:23Z

The one caveat I want to say: if we do this, we have to use FILTER NOT EXISTS which is extremely slow - keep that in mind when you write this, and try it on something like DO, HPO and UBERON to be sure that it wont be too inefficient.

anitacaron · 2024-01-11T18:29:21Z

Isn't it another exception for the label-synonym-polysemy-violation?

There's already an exception for abbreviation (OMO:0003000)

allenbaron · 2024-01-12T18:50:55Z

Yes, acronym (OMO:0003012) is a new synonym type that would also be an exception.

Honestly, the query at UBERON linked by @anitacaron (with minor modification) is probably the best bet for updating the duplicate_exact_synonym.rq query in ROBOT. Using a subquery only slows things down a bit compared to the current query but it's definitely simpler and probably faster for managing exceptions. I think the only changes to it would be:

Remove rdfs:label from VALUES statement.
Add a VALUES statement for the exceptions (abbreviation & now acronym).
Possibly drop the use of UCASE.
- The current duplicate_exact_synonym.rq query will not report duplicates synonyms with variation in case or language tag (Duplicate label/synonym checks need to normalize literal type #748). Were those intentional design choices? Just noting that the UBERON query also will not report duplicate synonyms if they differ in language tag.

I know @jamesaoverton is particularly concerned with ROBOT's backward compatibility, which I appreciate. Would these changes be a concern in that regard?

allenbaron · 2024-01-12T21:51:24Z

I decided to look more closely at execution time differences using doid-edit.owl and uberon.owl (because I had it on hand, not the edit file).

Just switching to the subquery approach without adding in the exclusion of synonym types or using UCASE takes about 1.07-1.43 times longer (DO: current = 6.13s, subquery = 6.57s; UBERON: current = 17.8s, subquery = 25.4s). Adding in the exclusion and UCASE slows things down further by ~ 2s for either DO & UBERON.

matentzn · 2024-01-15T11:13:50Z

@allenbaron thanks for the analysis!

Possibly drop the use of UCASE

I personally think we should introduce this now - I cannot imagine a single case where the duplicate synonym check should be case sensitive.. Of, course, this needs to be well documented!

variation in case or language tag

This is much more complicated, as you would want to

reject duplicates within the same language and
permit duplicates across languages.

Not sure how this should be solved!

Do you want to make a PR and see how it goes?

allenbaron · 2024-01-18T16:21:01Z

As an alternative to creating an exclusion for abbreviations and acronyms, could we introduce a new synonym predicate, something like skos:closeMatch for synonyms oboInOwl:hasCloseSynonym?

I guess a new synonym predicate probably has more cons than pros. If we were really going to do something like this, we probably should've just made abbreviations and acronyms their own synonym predicates instead of making them synonym types.

I'll work to open a PR for updating the SPARQL query soon.

matentzn · 2024-01-18T18:14:54Z

could we introduce a new synonym predicate, something like skos:closeMatch for synonyms oboInOwl:hasCloseSynonym?

I don't think we should use that system for acronyms, which are "exact" synonyms, but now that you say this - it seems super weird to me that there are no close synonyms! I never noticed that! Wow!

I'll work to open a PR for updating the SPARQL query soon.

Thanks!!!

allenbaron mentioned this issue Feb 1, 2024

Exclude some synonym types from duplicate_exact_synonym report query #1179

Merged

5 tasks

jamesaoverton closed this as completed in #1179 May 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ignoring duplicate exact synonyms that are acronyms in robot report #1175

Ignoring duplicate exact synonyms that are acronyms in robot report #1175

allenbaron commented Jan 8, 2024

matentzn commented Jan 11, 2024

matentzn commented Jan 11, 2024

anitacaron commented Jan 11, 2024

allenbaron commented Jan 12, 2024

allenbaron commented Jan 12, 2024 •

edited

Loading

matentzn commented Jan 15, 2024

allenbaron commented Jan 18, 2024

matentzn commented Jan 18, 2024

Ignoring duplicate exact synonyms that are acronyms in robot report #1175

Ignoring duplicate exact synonyms that are acronyms in robot report #1175

Comments

allenbaron commented Jan 8, 2024

matentzn commented Jan 11, 2024

matentzn commented Jan 11, 2024

anitacaron commented Jan 11, 2024

allenbaron commented Jan 12, 2024

allenbaron commented Jan 12, 2024 • edited Loading

matentzn commented Jan 15, 2024

allenbaron commented Jan 18, 2024

matentzn commented Jan 18, 2024

allenbaron commented Jan 12, 2024 •

edited

Loading