-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generate RO-Biolink predicate mappings based on a particular Biolink model #104
base: master
Are you sure you want to change the base?
Conversation
Also deleted ro-to-biolink-mappings.tsv, which contains all the mappings.
This should be updated by the Makefile.
Also added ro-to-biolink-predicate-mappings-all.tsv.
@balhoff I've now added checks that (1) look for duplication between the local mappings file and generated predicate files, and (2) look for Biolink predicates that are not present in the Biolink model. So far, I'm just printing out concerning PredicateMappings (which is based on the predicate mappings file generated as part of the Biolink model), so unfortunately this isn't very readable. Here's what the output looks like right now with 15 warnings:
We can ignore the CTD mappings since we currently don't export those as all. However, it looks like the following terms are duplicated:
|
I've deleted RO:0002313 from local mappings in 797ff28. |
Hi @balhoff -- just wanted to poke you to review this PR. If you need help in incorporating it into the changes you've made to re-adding CTD, please let me know. |
Hi @balhoff -- just wanted to poke you to review this PR. If you need help in incorporating it into the changes you've made to re-adding CTD, please let me know. |
Adds
scripts/generate_ro_biolink_mapping.sc
, a Scala CLI script for generating a list of mappings between RDF predicates and Biolink predicates downloaded from two sources:The Biolink model (https://github.com/biolink/biolink-model/blob/68d4e3d7612275d0d7e832a9919bf8666e1d5fde/biolink-model.yaml)These are written into the
ro-to-biolink-predicate-mappings.tsv
file (which I've included in this PR). If you want to see all the predicate mappings (not just the RO/GOREL ones), they are in thero-to-biolink-predicate-mappings-all.tsv
(https://github.com/ExposuresProvider/cam-pipeline/blob/e1d6dd063c43de31ac736dbd0ce1ee57008f64fc/ro-to-biolink-predicate-mappings-all.tsv).This file is then used by
scripts/kg_edges.dl
to add "qualifiers" tokg.tsv
. This does seem to work currently, producing output like:Things to do:
.asJson
from Circe to work. Help?ro-to-biolink-local-mappings.tsv
andro-to-biolink-predicate-mappings.tsv
-- any examples in the original list should be deleted so that only the qualified predicate is used.ro-to-biolink-local-mappings.tsv
for any predicates that have been deleted -- we can temporarily add those directly toscripts/generate_ro_biolink_mappings.sc
, but eventually we should get those into the Biolink model.This PR also adds the command for generating
ro-to-biolink-predicate-mappings.tsv
, although at the moment this will never be run, as the GitHub repo includes the predicate mappings file.WIP: will close #95 once implemented.