Skip to content

Solr_ontology_lookup

Kai Blumberg edited this page Aug 12, 2021 · 2 revisions

Goal: to convert BCODMO's taxonomic data into NCBITaxon purls.

example query:

https://lod.bco-dmo.org/browse/?query=++SELECT+DISTINCT+%3Fdataset+%3FdatasetDesc+%3Fdownload_url+%3Finstance+%3Ftaxa+%3FdatasetParam+%3FdatasetParamDef+%23%3Finstance+%3Ftaxa+%3Fdefinition%0D%0A+++++WHERE+%7BVALUES+%28%3Ftaxa%29+%7B+%28+%22taxon%22%40en-us+%29+%28+%22species%22%40en-us+%29+%28+%22common_name%22%40en-us+%29+%28+%22species_epithet%22%40en-us+%29+%28+%22dominant_species%22%40en-us+%29%0D%0A++++++++++++%28+%22taxon_code%22%40en-us+%29+%28+%22animal_group%22%40en-us+%29+%28+%22class%22%40en-us+%29+%28+%22order%22%40en-us+%29+%28+%22phylum%22%40en-us+%29%7D%0D%0A++++++++++++%3Finstance+a+%3Chttp%3A%2F%2Focean-data.org%2Fschema%2FMonitoredProperty%3E+.%0D%0A++++++++++++%3Finstance+skos%3AprefLabel+%3Ftaxa+.+%0D%0A++++++++++++OPTIONAL+%7B+%3Finstance+skos%3Adefinition+%3Fdefinition+.+%7D%0D%0A++++++++++++%3FdatasetParam+odo%3AisInstanceOf+%3Finstance+.%0D%0A++++++++++++OPTIONAL+%7B+%3FdatasetParam+skos%3Adefinition+%3FdatasetParamDef+.+%7D%0D%0A++++++++++++%3Fdataset+odo%3AstoresValuesFor+%3FdatasetParam+.%0D%0A++++++++++++%3Faffordance+schema%3AsubjectOf+%3Fdataset+.%0D%0A++++++++++++%3Faffordance+a+odo%3ADataDownloadAffordance+.%0D%0A++++++++++++%3Faffordance+schema%3Atarget+%5B+schema%3Aurl+%3Fdownload_url+%5D+.+%0D%0A++++++++++++FILTER+REGEX%28%3Fdataset%2C+%22%2Fdataset%2F%22%29%0D%0A++++++++++++OPTIONAL+%7B+%3Fdataset+odo%3AdatasetTitle+%3FdatasetDesc+.+%7D+%0D%0A++++++++++++%7D%0D%0AORDER+BY+%3Fdataset+%3FdatasetParam+%3Finstance+

which gives the results for the stored BCODMO taxonomic data, the download_url column contains the links to the files which include the taxonomic info.

In order to transform this data, that for examples looks like the following:

taxon_code  taxon  
6118290102  Acartia_danae  
6118290113  Acartia_hudsonica  
6118290103  Acartia_longiremis  

We'll need to perform text lookups to match against NCBI taxon purls. OLS includes this as part of their system, they have their index page which links to their solr-schema page, with an example github repo. The long and short is that you can use this to make a SOLR index from an ontology file. In our case we'd use NCBITaxon (slim or the whole thing).

See the Solr getting started page where you setup the database, add some docs indexing them for lookup. Then run it and then query it.

Much to figure about optimzing/using this correctly but we could presumably have a script(s) that given a dataset with taxonomic info and the column with that info, clean the lists of taxon strings (remove -'s trailing spaces etc), then for each one post a solr query to get back the IRI. So a string Acartia danae would return http://purl.obolibrary.org/obo/NCBITaxon_545071

Clone this wiki locally