-
Notifications
You must be signed in to change notification settings - Fork 0
Solr_ontology_lookup
Goal: to convert BCODMO's taxonomic data into NCBITaxon purls.
example query:
https://lod.bco-dmo.org/browse/?query=++SELECT+DISTINCT+%3Fdataset+%3FdatasetDesc+%3Fdownload_url+%3Finstance+%3Ftaxa+%3FdatasetParam+%3FdatasetParamDef+%23%3Finstance+%3Ftaxa+%3Fdefinition%0D%0A+++++WHERE+%7BVALUES+%28%3Ftaxa%29+%7B+%28+%22taxon%22%40en-us+%29+%28+%22species%22%40en-us+%29+%28+%22common_name%22%40en-us+%29+%28+%22species_epithet%22%40en-us+%29+%28+%22dominant_species%22%40en-us+%29%0D%0A++++++++++++%28+%22taxon_code%22%40en-us+%29+%28+%22animal_group%22%40en-us+%29+%28+%22class%22%40en-us+%29+%28+%22order%22%40en-us+%29+%28+%22phylum%22%40en-us+%29%7D%0D%0A++++++++++++%3Finstance+a+%3Chttp%3A%2F%2Focean-data.org%2Fschema%2FMonitoredProperty%3E+.%0D%0A++++++++++++%3Finstance+skos%3AprefLabel+%3Ftaxa+.+%0D%0A++++++++++++OPTIONAL+%7B+%3Finstance+skos%3Adefinition+%3Fdefinition+.+%7D%0D%0A++++++++++++%3FdatasetParam+odo%3AisInstanceOf+%3Finstance+.%0D%0A++++++++++++OPTIONAL+%7B+%3FdatasetParam+skos%3Adefinition+%3FdatasetParamDef+.+%7D%0D%0A++++++++++++%3Fdataset+odo%3AstoresValuesFor+%3FdatasetParam+.%0D%0A++++++++++++%3Faffordance+schema%3AsubjectOf+%3Fdataset+.%0D%0A++++++++++++%3Faffordance+a+odo%3ADataDownloadAffordance+.%0D%0A++++++++++++%3Faffordance+schema%3Atarget+%5B+schema%3Aurl+%3Fdownload_url+%5D+.+%0D%0A++++++++++++FILTER+REGEX%28%3Fdataset%2C+%22%2Fdataset%2F%22%29%0D%0A++++++++++++OPTIONAL+%7B+%3Fdataset+odo%3AdatasetTitle+%3FdatasetDesc+.+%7D+%0D%0A++++++++++++%7D%0D%0AORDER+BY+%3Fdataset+%3FdatasetParam+%3Finstance+
which gives the results for the stored BCODMO taxonomic data, the download_url
column contains the links to the files which include the taxonomic info.
In order to transform this data, that for examples looks like the following:
taxon_code taxon
6118290102 Acartia_danae
6118290113 Acartia_hudsonica
6118290103 Acartia_longiremis
We'll need to perform text lookups to match against NCBI taxon purls. OLS includes this as part of their system, they have their index page which links to their solr-schema page, with an example github repo. The long and short is that you can use this to make a SOLR index from an ontology file. In our case we'd use NCBITaxon (slim or the whole thing).
See the Solr getting started page where you setup the database, add some docs indexing them for lookup. Then run it and then query it.
Much to figure about optimzing/using this correctly but we could presumably have a script(s) that given a dataset with taxonomic info and the column with that info, clean the lists of taxon strings (remove -'s trailing spaces etc), then for each one post a solr query to get back the IRI. So a string Acartia danae
would return http://purl.obolibrary.org/obo/NCBITaxon_545071