Best way to model four different kinds of metrics in biolink ML #1

matentzn · 2020-12-10T11:30:14Z

There are four different kinds of metrics we need to represent here:

Simple (metric: "value") (see below example: axiom_count)
Simple list (metric: [ "val1", "val2"]) (see below example: axiom_types)
Constrained map (metric: { "feature1": "value", "feature2": "value"} (see below example: axiom_type_count)
Open map (metric: { "string1": "value", "string2": "value"} (see below example: namespace_axiom_count)

The goal for the biolink modelling exercise here would be to generate a json schema with which to check a document of metrics for schema constraints (datatypes etc), but also, to just have a nicely readable documentation of what the metrics mean, with the potential of perhaps using the JSON-LD context more widely to communicate metrics between groups.

{
  "metrics": {
    "axiom_count": 5504,
    "axiom_types": [
      "AnnotationAssertion",
      "EquivalentClasses",
      "TransitiveObjectProperty",
      "SubObjectPropertyOf",
      "SymmetricObjectProperty",
      "SubPropertyChainOf",
      "Declaration",
      "SubClassOf",
      "InverseObjectProperties"
    ],
    "axiom_type_count": {
      "AnnotationAssertion": 4356,
      "EquivalentClasses": 106,
      "TransitiveObjectProperty": 12,
      "SubObjectPropertyOf": 25,
      "SymmetricObjectProperty": 1,
      "SubPropertyChainOf": 11,
      "Declaration": 320,
      "SubClassOf": 666,
      "InverseObjectProperties": 7
    },
    "namespace_axiom_count": {
      "oboInOwl": 4819,
      "IAO": 306,
      "UBERON": 1903,
      "rdfs": 308,
      "BFO": 283,
      "obo": 235,
      "RO": 307,
      "foaf": 56,
      "BSPO": 29
    }
  }
}

The first attempt at dealing with this looks something like this:

id: http://www.obofoundry.org/registry/metrics.yml
name: metrics

types:
  mean:
    base: float
    uri: xsd:float
  count:
    base: int
    uri: xsd:int
  string:
    base: str
    uri: xsd:string
  boolean:
    base: boolean
    uri: xsd:boolean

classes:

  metrics:
    slots:
        - axiom_count
        - axiom_types
        
  axiom_type_count:
    description: Counting the various axiom types used in the ontology.
    slots:
        - AnnotationAssertion
        - EquivalentClasses
        - TransitiveObjectProperty
        - SubObjectPropertyOf
        - SymmetricObjectProperty
        - SubPropertyChainOf
        - Declaration
        - SubClassOf
        - InverseObjectProperties
  
  namespace_axiom_count:
    description: The number of axioms used by this ontology, broken down by which namespaces they reference (according the the OBO curiemap). For example, 19 axioms reference at least 1 entity in the BFO namespace.

slots:
  axiom_count:
    description: The number of axioms in the ontology.
    range: count
  axiom_types:
    description: A list of axiom types used in the ontology.
    multivalued: true
    
  AnnotationAssertion:
    range: count
  EquivalentClasses:
    range: count
  TransitiveObjectProperty:
    range: count
  SubObjectPropertyOf:
    range: count
  SymmetricObjectProperty:
    range: count
  SubPropertyChainOf:
    range: count
  Declaration:
    range: count
  SubClassOf:
    range: count
  InverseObjectProperties:
    range: count

@cmungall
@deepakunni3 has already given me some advice on how to go about this use case, which is obviously a bit non-standard.. First of all I find it unsatisfying to have some metrics being slots and others being classes. Secondly, I don't know exactly how to model the namespace_axiom_count case, due to the open set of keys. Deepak recommended to use key/value modelling, but it seems unsatisfactory to bend the perfectly find JSON structure just to fit a modelling framework. What are your thoughts on this?

The text was updated successfully, but these errors were encountered:

cmungall · 2020-12-12T01:06:48Z

Pinging @hsolbrig. I think we need an equivalent to json-schema open maps.

orthogonal point: to replicate the json, namespace_axiom_count and axiom_type_count should be slots. You will still need classes to hold their slots.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Best way to model four different kinds of metrics in biolink ML #1

Best way to model four different kinds of metrics in biolink ML #1

matentzn commented Dec 10, 2020

cmungall commented Dec 12, 2020

Best way to model four different kinds of metrics in biolink ML #1

Best way to model four different kinds of metrics in biolink ML #1

Comments

matentzn commented Dec 10, 2020

cmungall commented Dec 12, 2020