Skip to content

From Damion

Kai Blumberg edited this page Jul 13, 2022 · 5 revisions

Potential projects

  • DataHarmonizer

  • FOODON improvements more to do here, possibly the CDNO knowledge graph idea I've been discussing with Lilly.

  • Finish UOM

general

https://cidgoh.ca/, see projects page

https://github.com/cidgoh

https://github.com/cidgoh/DataHarmonizer

Looking for the USDA funing to help with https://fdc.nal.usda.gov/

Our main objective is to enable easier and patterned ontologization of research and agency surveillance datasets, first in infectious disease investigations, and now more broadly in other domains like agriculture, and bioinformatics. So to recap what I mentioned already to you, aside from developing ontologies like foodon and genepio, our first – although very appropriate for clients – tangible effort is the DataHarmonizer DH project, which has ontology in it insofar as it provides a way to associate a column of the spreadsheet, or a categorical value, with an ontology term, now all codified in LinkML. Agencies still get to use tabular data and nothing more, in their submissions to archives. A next priority for DH is to add some relational database management parts to that so data can be edited within an experimental design study context or other 1-many table relations. (We’re also developing training materials to help clients understand how to best develop the ontology terms they need within an OBOFoundry context).

But we have a graph database future vision in store for our partner agencies. What we need to do next is make an easier environment for composing ontology-driven specifications, and for applying them to datasets with cleanup tools. GEEM was our past effort there. Kai you already have clear expertise in that area with what I can see are the ontology, graph, and query components.

There is also a big emphasis in the next year on getting a multi-million dollar FAIR datasharing platform proposal going for public health and genomic surveillance projects specifically, and more broadly other kinds of data that need a bit more of a complicated permission system to access, i.e. involving personal data, be it medical or social determinants of health. In getting a grant for that we would hire more programming staff.

we’d also want to discuss support of things like https://units-of-measurement.org/ and other visions. Part of the discussion, with respect to LinkML, is how to develop an ontology connected linkml schema editor for example, which we’d want to coordinate with the NMDC team too.

From Graham King

Australian Reference Genome Atlas https://www.biocommons.org.au/arga (ERGA in europe) https://www.erga-biodiversity.eu/ Atlas of LIving Australia https://www.ala.org.au/ Intermine http://intermine.org/im-docs/ DivSeek Commons https://divseekintl.org/commons-landscape-matrix/ Wheat information system https://urgi.versailles.inrae.fr/wheatis/

You may also be interested in this, which evolved from the earlier Ondex system at Rothamsted

https://knetminer.com/ https://en.wikipedia.org/wiki/KnetMiner

Clone this wiki locally