Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automate generation of RO-Wikidata mappings in SSSOM #444

Draft
wants to merge 3 commits into
base: master
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
38 changes: 20 additions & 18 deletions src/mappings/ro-to-wikidata.sssom.tsv
Original file line number Diff line number Diff line change
Expand Up @@ -6,21 +6,23 @@
#mapping_provider: http://obofoundry.org/ontology/ro
#subject_source: http://purl.obolibrary.org/obo/ro.owl
subject_id subject_label predicate_id object_id object_label match_type comments
RO:0001025 owl:equivalentProperty wd:P276 location Curated derived from wikidata via SPARQL
RO:0002211 owl:equivalentProperty wd:P128 regulates (molecular biology) Curated derived from wikidata via SPARQL
BFO:0000051 owl:equivalentProperty wd:P527 has part Curated derived from wikidata via SPARQL
BFO:0000050 owl:equivalentProperty wd:P361 part of Curated derived from wikidata via SPARQL
RO:0002204 owl:equivalentProperty wd:P702 encoded by Curated derived from wikidata via SPARQL
RO:0002205 owl:equivalentProperty wd:P688 encodes Curated derived from wikidata via SPARQL
RO:0002411 owl:equivalentProperty wd:P828 has cause Curated derived from wikidata via SPARQL
RO:0002162 owl:equivalentProperty wd:P703 found in taxon Curated derived from wikidata via SPARQL
RO:0002404 owl:equivalentProperty wd:P1542 has effect Curated derived from wikidata via SPARQL
RO:0002405 owl:equivalentProperty wd:P1536 immediate cause of Curated derived from wikidata via SPARQL
RO:0003001 owl:equivalentProperty wd:P2849 produced by Curated derived from wikidata via SPARQL
RO:0002302 owl:equivalentProperty wd:P2176 drug used for treatment Curated derived from wikidata via SPARQL
RO:0002008 owl:equivalentProperty wd:P1382 partially coincident with Curated derived from wikidata via SPARQL
RO:0002005 owl:equivalentProperty wd:P3189 innervated by Curated derived from wikidata via SPARQL
RO:0002202 owl:equivalentProperty wd:P3094 develops from Curated derived from wikidata via SPARQL
RO:0000087 owl:equivalentProperty wd:P2868 subject has role Curated derived from wikidata via SPARQL
RO:0002379 owl:equivalentProperty wd:P3403 coextensive with Curated derived from wikidata via SPARQL
RO:0002134 owl:equivalentProperty wd:P3190 innervates Curated derived from wikidata via SPARQL
BFO:0000050 part of owl:equivalentProperty wd:P361 part of Curated derived from wikidata via SPARQL
BFO:0000051 has part owl:equivalentProperty wd:P527 has part Curated derived from wikidata via SPARQL
RO:0000087 has role owl:equivalentProperty wd:P2868 subject has role Curated derived from wikidata via SPARQL
RO:0001025 located in owl:equivalentProperty wd:P276 location Curated derived from wikidata via SPARQL
RO:0002005 innervated_by owl:equivalentProperty wd:P3189 innervated by Curated derived from wikidata via SPARQL
RO:0002008 coincident with owl:equivalentProperty wd:P1382 partially coincident with Curated derived from wikidata via SPARQL
RO:0002134 innervates owl:equivalentProperty wd:P3190 innervates Curated derived from wikidata via SPARQL
RO:0002162 in taxon owl:equivalentProperty wd:P703 found in taxon Curated derived from wikidata via SPARQL
RO:0002202 develops from owl:equivalentProperty wd:P3094 develops from Curated derived from wikidata via SPARQL
RO:0002204 gene product of owl:equivalentProperty wd:P702 encoded by Curated derived from wikidata via SPARQL
RO:0002205 has gene product owl:equivalentProperty wd:P688 encodes Curated derived from wikidata via SPARQL
RO:0002211 regulates owl:equivalentProperty wd:P128 regulates (molecular biology) Curated derived from wikidata via SPARQL
RO:0002302 is treated by substance owl:equivalentProperty wd:P2176 drug used for treatment Curated derived from wikidata via SPARQL
RO:0002379 spatially coextensive with owl:equivalentProperty wd:P3403 coextensive with Curated derived from wikidata via SPARQL
RO:0002404 causally downstream of owl:equivalentProperty wd:P1542 has effect Curated derived from wikidata via SPARQL
RO:0002405 immediately causally downstream of owl:equivalentProperty wd:P1536 immediate cause of Curated derived from wikidata via SPARQL
RO:0002411 causally upstream of owl:equivalentProperty wd:P828 has cause Curated derived from wikidata via SPARQL
RO:0003001 produced by owl:equivalentProperty wd:P2849 produced by Curated derived from wikidata via SPARQL
RO:0012001 has small molecule activator owl:equivalentProperty wd:P3771 activator of Curated derived from wikidata via SPARQL
RO:0012002 has small molecule inhibitor owl:equivalentProperty wd:P3776 inhibitor of Curated derived from wikidata via SPARQL
98 changes: 98 additions & 0 deletions src/tools/generate_wikidata_mapping.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,98 @@
# -*- coding: utf-8 -*-

"""Generates summaries of the relation ontology.

This script requires the installation of the :mod:`requests` library.
"""

import json
import os
import textwrap

import requests

HERE = os.path.abspath(os.path.dirname(__file__))
SSSOM_PATH = os.path.abspath(os.path.join(HERE, os.pardir, 'mappings', 'ro-to-wikidata.sssom.tsv'))
OBO_PATH = os.path.abspath(os.path.join(HERE, os.pardir, os.pardir, 'ro.json'))


def get_id_name_mapping():
"""Get a mapping from RO_XXXXXXX identifiers to their labels."""
with open(OBO_PATH) as file:
obo_json = json.load(file)
return {
node['id'][len('http://purl.obolibrary.org/obo/'):]: node['lbl']
for node in obo_json['graphs'][0]['nodes']
if 'lbl' in node
}


# URL for the Wikidata SPARQL service
WIKIDATA_SPARQL_ENDPOINT = 'https://query.wikidata.org/bigdata/namespace/wdq/sparql'

MAPPING_SPARQL = '''
SELECT ?prop ?propLabel ?ro_id
WHERE {
?prop wdt:P3590 ?ro_id .
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
'''


def get_query(query: str):
"""Get the results of a SPARQL query to the Wikidata SPARQL endpoint as JSON."""
res = requests.get(
WIKIDATA_SPARQL_ENDPOINT,
params={'query': query, 'format': 'json'},
)
res.raise_for_status()
res_json = res.json()
return res_json['results']['bindings']


def main():
ro_id_name = get_id_name_mapping()

predicate = 'owl:equivalentProperty'
match_type = 'Curated'
comments = 'derived from wikidata via SPARQL'

headers = [
'subject_id', 'subject_label', 'predicate_id', 'object_id', 'object_label', 'match_type', 'comments',
]
with open(SSSOM_PATH, 'w') as file:
print(textwrap.dedent('''\
#curie_map:
# wd: http://www.wikidata.org/entity/
# RO: http://purl.obolibrary.org/obo/RO_
# BFO: http://purl.obolibrary.org/obo/BFO_
#license: https://creativecommons.org/publicdomain/zero/1.0/
#mapping_provider: http://obofoundry.org/ontology/ro
#subject_source: http://purl.obolibrary.org/obo/ro.owl
''').rstrip(), file=file)
print(*headers, sep='\t', file=file)

rows = [
(
row['ro_id']['value'],
row['prop']['value'][len('http://www.wikidata.org/entity/'):],
row['propLabel']['value']
)
for row in get_query(MAPPING_SPARQL)
]
for ro_id, wd_id, wd_label in sorted(rows):
print(
ro_id.replace('_', ':'),
ro_id_name[ro_id],
predicate,
f'wd:{wd_id}',
wd_label,
match_type,
comments,
sep='\t',
file=file,
)


if __name__ == '__main__':
main()