Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Best way to model four different kinds of metrics in biolink ML #1

Open
matentzn opened this issue Dec 10, 2020 · 1 comment
Open

Comments

@matentzn
Copy link
Collaborator

There are four different kinds of metrics we need to represent here:

  1. Simple (metric: "value") (see below example: axiom_count)
  2. Simple list (metric: [ "val1", "val2"]) (see below example: axiom_types)
  3. Constrained map (metric: { "feature1": "value", "feature2": "value"} (see below example: axiom_type_count)
  4. Open map (metric: { "string1": "value", "string2": "value"} (see below example: namespace_axiom_count)

The goal for the biolink modelling exercise here would be to generate a json schema with which to check a document of metrics for schema constraints (datatypes etc), but also, to just have a nicely readable documentation of what the metrics mean, with the potential of perhaps using the JSON-LD context more widely to communicate metrics between groups.

{
  "metrics": {
    "axiom_count": 5504,
    "axiom_types": [
      "AnnotationAssertion",
      "EquivalentClasses",
      "TransitiveObjectProperty",
      "SubObjectPropertyOf",
      "SymmetricObjectProperty",
      "SubPropertyChainOf",
      "Declaration",
      "SubClassOf",
      "InverseObjectProperties"
    ],
    "axiom_type_count": {
      "AnnotationAssertion": 4356,
      "EquivalentClasses": 106,
      "TransitiveObjectProperty": 12,
      "SubObjectPropertyOf": 25,
      "SymmetricObjectProperty": 1,
      "SubPropertyChainOf": 11,
      "Declaration": 320,
      "SubClassOf": 666,
      "InverseObjectProperties": 7
    },
    "namespace_axiom_count": {
      "oboInOwl": 4819,
      "IAO": 306,
      "UBERON": 1903,
      "rdfs": 308,
      "BFO": 283,
      "obo": 235,
      "RO": 307,
      "foaf": 56,
      "BSPO": 29
    }
  }
}

The first attempt at dealing with this looks something like this:

id: http://www.obofoundry.org/registry/metrics.yml
name: metrics

types:
  mean:
    base: float
    uri: xsd:float
  count:
    base: int
    uri: xsd:int
  string:
    base: str
    uri: xsd:string
  boolean:
    base: boolean
    uri: xsd:boolean

classes:

  metrics:
    slots:
        - axiom_count
        - axiom_types
        
  axiom_type_count:
    description: Counting the various axiom types used in the ontology.
    slots:
        - AnnotationAssertion
        - EquivalentClasses
        - TransitiveObjectProperty
        - SubObjectPropertyOf
        - SymmetricObjectProperty
        - SubPropertyChainOf
        - Declaration
        - SubClassOf
        - InverseObjectProperties
  
  namespace_axiom_count:
    description: The number of axioms used by this ontology, broken down by which namespaces they reference (according the the OBO curiemap). For example, 19 axioms reference at least 1 entity in the BFO namespace.

slots:
  axiom_count:
    description: The number of axioms in the ontology.
    range: count
  axiom_types:
    description: A list of axiom types used in the ontology.
    multivalued: true
    
  AnnotationAssertion:
    range: count
  EquivalentClasses:
    range: count
  TransitiveObjectProperty:
    range: count
  SubObjectPropertyOf:
    range: count
  SymmetricObjectProperty:
    range: count
  SubPropertyChainOf:
    range: count
  Declaration:
    range: count
  SubClassOf:
    range: count
  InverseObjectProperties:
    range: count

@cmungall
@deepakunni3 has already given me some advice on how to go about this use case, which is obviously a bit non-standard.. First of all I find it unsatisfying to have some metrics being slots and others being classes. Secondly, I don't know exactly how to model the namespace_axiom_count case, due to the open set of keys. Deepak recommended to use key/value modelling, but it seems unsatisfactory to bend the perfectly find JSON structure just to fit a modelling framework. What are your thoughts on this?

@cmungall
Copy link

Pinging @hsolbrig. I think we need an equivalent to json-schema open maps.

orthogonal point: to replicate the json, namespace_axiom_count and axiom_type_count should be slots. You will still need classes to hold their slots.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants