Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Self-cross-references relationships in EFO ontology #684

Closed
Elysheba opened this issue Jan 24, 2020 · 6 comments
Closed

Self-cross-references relationships in EFO ontology #684

Elysheba opened this issue Jan 24, 2020 · 6 comments
Labels

Comments

@Elysheba
Copy link

Elysheba commented Jan 24, 2020

Hi,

while using the EFO ontology, I noticed that there are some EFO identifiers, as well as MONDO and Orphanet identifiers that were incorporated, that refer back to another EFO identifier. So these are a sort of self-cross-reference that is established this way within EFO to nodes of a different level. Is this something to expect and how should these therm then be interpreted?

I have aggregated a list below with the examples I could identify (I only checked EFO/MONDO/Orphanet identifiers within the EFO ontology and whether they have an EFO cross-reference, the inverse is more difficult to ascertain). Based on the online OLS interface, some are obtained through MONDO ontology and it's cross-references.

Many thanks,
Liesbeth

efo_self_crossref.txt

@paolaroncaglia
Copy link
Collaborator

paolaroncaglia commented Jan 24, 2020

Hi @Elysheba ,

Thanks for your message. I had a quick look at the first pair in your file, i.e.
"DB1" "id1" "DB2" "id2"
"MONDO" "0024306" "EFO" "1000036"
As you guessed, that's a Mondo term ('acquired lactid acidosis') that EFO imported and that already came with an EFO xref (to EFO 'lactic acidosis'). I interpret it as a broad mapping in Mondo - possibly, at the time when Mondo was built, it didn't have a broad term for 'lactic acidosis'. But now it has one (MONDO:0006040), and that doesn't have an EFO xref. My quick thoughts are that we should

  • Check a few more cases from your file (to start with)
  • Get in touch with Mondo to have them "update" their EFO xref(s) at the source
  • Further action items, if necessary, based on the outcome of the first step above. Aiming of course at addressing all cases you highlighted, as necessary.

Thanks,

Paola

@dhimmel
Copy link
Contributor

dhimmel commented Jun 30, 2022

We recently noticed internal xrefs on EFO Otar Slim v3.43.0, which presumably has the same xrefs as EFO. We found 5580 xrefs whose target was a term in EFO. We can further subdivide this into:

  1. self-xrefs where the source and target node are the same (900 occurrences). See also Remove all xrefs to self #752. These are harmless but unnecessary & clutter.

  2. xrefs to a different term in EFO (4680 occurrences). These are problematic because they contradict the ontology.

Here's a spreadsheet of all xrefs, but currently filtered for xref_in_efo (xref points to a term in EFO): efo-otar-slim-xrefs.xlsx.

Perhaps EFO could filter all xrefs that point to terms in the EFO release? That would be a good quick fix, but many of the "xrefs to a different term in EFO" seem to point to deeper inconsistencies between ontologies. CC @paolaroncaglia @zoependlington @matentzn

@matentzn
Copy link
Contributor

@dhimmel

I will leave @zoependlington to hash out a plan for this but may I say:

hasDbXref means nothing. I can add hasDbXref between all terms if I want to, its meaning may have once been "kinda the same", but now its "kinda related" and I would recommend not to use it to map data for anything other than perhaps machine learning (which can handle the noise). The fact that that there are thousands of xrefs within EFO alone should ring an alarm bell..

That said, its a matter of 10 minutes to at a SPARQL update query to EFO to get rid of of these xrefs, @zoependlington can contact me if she wants me to add it to the release pipeline! :)

@dhimmel
Copy link
Contributor

dhimmel commented Jun 30, 2022

hasDbXref means nothing

Hmm in practice it seems to mean that the predicate is the most equivalent term to the subject in the predicate's vocabulary.

How else does one convert external identifiers (like from MeSH, ICD10, etc) to EFO without hasDbXref? In #935 we discussed mondo:exactMatch and skos:exactMatch, but my understanding is that this is not widely implemented in EFO.

That said, its a matter of 10 minutes to at a SPARQL update query to EFO to get rid of of these xrefs, @zoependlington can contact me if she wants me to add it to the release pipeline!

Nice!

@matentzn
Copy link
Contributor

@dhimmel

How else does one convert external identifiers (like from MeSH, ICD10, etc) to EFO without hasDbXref?

Of course, I was just a bit ... direct :D

If you want to get this to the mapping thing to the next level, can you share with me a list of prefixes you really need for OTAR? like between what and what kind of terms do you need mappings?

@zoependlington
Copy link
Collaborator

This should now be fixed so moving to done. All self-xrefs in this spreadsheet will be removed upon release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants