List of issues with the current specifications #230

marcverhagen · 2024-06-28T18:36:54Z

While going through the specifications to clarify issue #228 I was making many small edits, but wanted to open an issue to deal with minor issues and questions that I did not immediately fix.

There is a section that reads

Note that when a property is set to some value in the contains in the view metadata then all annotations of that type should adhere to that value, in this case the document and timeUnit are set to “m1” and “seconds” respectively. In other words, the contains dictionary not only functions as an overview of the annotation types in this view, but also as a place for common metadata shared among annotations of a type. This is useful especially for document property, as in a single view, an app is likely to process only a limited number of source documents and resulting annotation objects will be anchored on those documents. It is technically possible for TimeFrame type to add document properties to individual annotation objects and overrule the metadata property, but this is not to be done without really good reasons. We get back to this later.

We should mull this over since it comes over as a bit muddled.

Smaller notes:

In the section on view metadata there is an example from barsdetection, which uses a parameter named not-defined-parameter, it is not clear why it is there.
The first mention of the id property is just under the top-level JSON for a view, it is not yet clear to me wether that mention needs to be edited.

(These notes are updated up to the section "The view’s annotations property". )

The text was updated successfully, but these errors were encountered:

keighrim · 2024-07-02T16:54:59Z

We should mull this over since it comes over as a bit muddled.

I'm not sure what you meant by mulling over. But I do agree the way we used the term "annotation type metadata" in the quoted passage (as well as current HTML rendering of clams vocabulary) is quite muddy. Specifically,

there are "metadata" and "properties" sections in a vocab page (https://mmif.clams.ai/vocabulary/TimeFrame/v5/), but no description to be found what those sections mean.
we use any key either from "metadata" or "properties" sections of vocab page in MMIF's contains metadata or properties of annotation objs, despite this passage in the very bottom of the spec document

The vocabulary also defines metadata properties. For example, the optional property timeUnit can be used for a TimeFrame to specify what unit is used for the start and end time points in instances of TimeFrame. This property is not expressed in the annotation but in the metadata of the view with the annotation type in the contains dictionary

This results in contains obj in the view metadata just being used to "factor out" common props, in practice.

By mulling-over,

Should we consider getting rid of the (currently meaningless) distinction between "metadata" and "properties" of annotation types?
Should we put more restriction on what can be laid in contains and what can be annotation.properties, to give that distinction actual use?
Or something else?

marcverhagen · 2024-07-02T19:33:31Z

With "mulling-over" I meant exactly what we are doing now: rethinking that passage.

I don't think that the distinction between metadata and properties is meaningless, anything that holds for all instances of an annotation type should not be expressed on every single instance. We do muddle the water a bit by having what could be considered as two kinds of metadata: things like "labelset" versus things like "document", where I do get a very different feel of what they really are. But the description/explanation of those two should probably indeed be sharpened.

keighrim · 2024-07-03T11:54:22Z

Where I think things are getting muddy is that we have two "types" of annotation attributes (metadata and property) and there are two "places" we put annotation attributes (view.metadata.contains and annotation.properties). Then, we vaguely talk about those two different dualities as if they are corresponding to each other by saying things like:

When a property is set to some value in the contains dict of the view metadata, then all annotations of that type should adhere to that value.

The vocabulary also defines metadata properties. ... This property is not expressed in the annotation but in the metadata of the view.

, when, technically, there's no such correspondence. For example:

Developers can put the timeUnit metadata key in the annotation.properties dict, and that's perfectly valid MMIF (of course, this is not a recommended way of doing it, but also not a forbidden way).
A text summarization app can generate many new TextDocuments in a single view based on previously existing different TextDocuments. Then, it is impossible to put the document metadata key at the view level (values are not "common" to factor them out).

So, if there are no real differences between the two "places" in practice, I think we should either:

Just use one single place and merge them - This is a no-go since we need that contains dict for quick navigation of MMIF.
Explicitly state that the dual "places" have nothing to do with the dual "types," and leave the selection of a place to put an attribute to the liberty of the developers (which is roughly how things are done currently, see below).
Enforce real differences by explicitly stating that annotation.properties is ONLY FOR property keys (no timeUnit or document in the annotation object) and view.metadata.contains is ONLY FOR metadata keys. Hence, when a group of annotations doesn't share the same metadata value, they should split into different views with different contains dicts.

By the way, I spent many hours in the past (clamsproject/mmif-python#226) to write "magic" code in the SDK to automatically distribute-when-deserialize and factor-when-serialize annotation attributes between contains and properties dicts, in the hope of minimizing human errors. So, what's currently done is:

Regardless of the key, anything common across all annotations (of a single type) is factored out to the contains dict when serialized into JSON - meaning if there's only one TimePoint in a SWT output, all the stuff including classification scores will be thrown into view.metadata.contains.
Regardless of the key, any attributes can be accessed by the .get_property() method when dealing with Annotation instances from the mmif-python SDK.

clams-bot added this to infra Jun 28, 2024

github-project-automation bot moved this to Todo in infra Jun 28, 2024

keighrim mentioned this issue Aug 14, 2024

promote annotations' id field #233

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

List of issues with the current specifications #230

List of issues with the current specifications #230

marcverhagen commented Jun 28, 2024 •

edited

Loading

keighrim commented Jul 2, 2024

marcverhagen commented Jul 2, 2024

keighrim commented Jul 3, 2024 •

edited

Loading

List of issues with the current specifications #230

List of issues with the current specifications #230

Comments

marcverhagen commented Jun 28, 2024 • edited Loading

keighrim commented Jul 2, 2024

marcverhagen commented Jul 2, 2024

keighrim commented Jul 3, 2024 • edited Loading

marcverhagen commented Jun 28, 2024 •

edited

Loading

keighrim commented Jul 3, 2024 •

edited

Loading