Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add CLO processing notebook #133

Merged
merged 8 commits into from
Jul 3, 2023
Merged

Add CLO processing notebook #133

merged 8 commits into from
Jul 3, 2023

Conversation

cthoyt
Copy link
Member

@cthoyt cthoyt commented Jun 28, 2023

The Cell Line Ontology (CLO) is a detailed resource, however it does not follow standard OBO modeling pattern for cross-references that either a predicate from SKOS or oboInOwl:hasDbXref to point to a single CURIE encoded as a string. Instead, it uses rdfs:seeAlso with a combination of non-standard CURIEs that are either comma or semi-colon delimited.

Depends on:

@cthoyt
Copy link
Member Author

cthoyt commented Jun 28, 2023

@bgyori CLO kind of has mappings available, but they need serious processing effort to get at. How should this relate to biomappings? Should we import directly into the "positive" mappings file? Or should we put them in "predicted" mappings file then allow a second round of curation?

@matentzn I also wasn't sure what the right semapv tag was to tag things extracted from an ontology. In theory, they are manually curated, but there's no actual evidence of how they were done, so I don't think it's fair to assume.

@matentzn
Copy link

@cthoyt I have struggled with the same! I would keep the mapping predicate as oboInOwl:hasDbXref which already provides a warning sign, and record the semapv:UnspecifiedMatching as the mapping_justification.

@bgyori
Copy link
Contributor

bgyori commented Jun 29, 2023

@cthoyt it looks like this PR adds xrefs from CLO as predictions and allows us to review and curate them manually to add them to mappings. I think this is appropriate under the assumption that a considerable number of these mappings are non-exact and therefore require review. If, however, we assume that the xrefs CLO provides are almost all actual exact equivalences then going through Biomappings shouldn't be necessary. In my cursory spot checking, to me it looks like these xrefs are exact mappings. So perhaps a better path forward would be to change CLO's representation to move these into proper xrefs rather than "see also" relations?

@cthoyt
Copy link
Member Author

cthoyt commented Jun 29, 2023

@bgyori, agreed, a lot of them that point to MeSH, BTO, and Cellosaurus are pretty high quality. Therefore, I moved the processing functionality into SeMRA.

I also added a many-to-many finder pipeline in SeMRA to assess the situation in CLO - I found 26 mappings that I want to manually curate and include in Biomappings as some must be non-exact.

For the rest of the mappings that could be exact, I asked in CLO-ontology/CLO#103 if we can turn some of these into proper xrefs.

@cthoyt
Copy link
Member Author

cthoyt commented Jun 29, 2023

See also CLO-ontology/CLO#104

@cthoyt cthoyt merged commit dc39c45 into master Jul 3, 2023
@cthoyt cthoyt deleted the clo-mappings branch July 3, 2023 14:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants