Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

List of issues with the current specifications #230

Open
2 tasks
marcverhagen opened this issue Jun 28, 2024 · 3 comments
Open
2 tasks

List of issues with the current specifications #230

marcverhagen opened this issue Jun 28, 2024 · 3 comments

Comments

@marcverhagen
Copy link
Contributor

marcverhagen commented Jun 28, 2024

While going through the specifications to clarify issue #228 I was making many small edits, but wanted to open an issue to deal with minor issues and questions that I did not immediately fix.

There is a section that reads

Note that when a property is set to some value in the contains in the view metadata then all annotations of that type should adhere to that value, in this case the document and timeUnit are set to “m1” and “seconds” respectively. In other words, the contains dictionary not only functions as an overview of the annotation types in this view, but also as a place for common metadata shared among annotations of a type. This is useful especially for document property, as in a single view, an app is likely to process only a limited number of source documents and resulting annotation objects will be anchored on those documents. It is technically possible for TimeFrame type to add document properties to individual annotation objects and overrule the metadata property, but this is not to be done without really good reasons. We get back to this later.

We should mull this over since it comes over as a bit muddled.

Smaller notes:

  • In the section on view metadata there is an example from barsdetection, which uses a parameter named not-defined-parameter, it is not clear why it is there.
  • The first mention of the id property is just under the top-level JSON for a view, it is not yet clear to me wether that mention needs to be edited.

(These notes are updated up to the section "The view’s annotations property". )

@clams-bot clams-bot added this to infra Jun 28, 2024
@github-project-automation github-project-automation bot moved this to Todo in infra Jun 28, 2024
@keighrim
Copy link
Member

keighrim commented Jul 2, 2024

We should mull this over since it comes over as a bit muddled.

I'm not sure what you meant by mulling over. But I do agree the way we used the term "annotation type metadata" in the quoted passage (as well as current HTML rendering of clams vocabulary) is quite muddy. Specifically,

  1. there are "metadata" and "properties" sections in a vocab page (https://mmif.clams.ai/vocabulary/TimeFrame/v5/), but no description to be found what those sections mean.

  2. we use any key either from "metadata" or "properties" sections of vocab page in MMIF's contains metadata or properties of annotation objs, despite this passage in the very bottom of the spec document

    The vocabulary also defines metadata properties. For example, the optional property timeUnit can be used for a TimeFrame to specify what unit is used for the start and end time points in instances of TimeFrame. This property is not expressed in the annotation but in the metadata of the view with the annotation type in the contains dictionary

    This results in contains obj in the view metadata just being used to "factor out" common props, in practice.

By mulling-over,

  • Should we consider getting rid of the (currently meaningless) distinction between "metadata" and "properties" of annotation types?
  • Should we put more restriction on what can be laid in contains and what can be annotation.properties, to give that distinction actual use?
  • Or something else?

@marcverhagen
Copy link
Contributor Author

With "mulling-over" I meant exactly what we are doing now: rethinking that passage.

I don't think that the distinction between metadata and properties is meaningless, anything that holds for all instances of an annotation type should not be expressed on every single instance. We do muddle the water a bit by having what could be considered as two kinds of metadata: things like "labelset" versus things like "document", where I do get a very different feel of what they really are. But the description/explanation of those two should probably indeed be sharpened.

@keighrim
Copy link
Member

keighrim commented Jul 3, 2024

Where I think things are getting muddy is that we have two "types" of annotation attributes (metadata and property) and there are two "places" we put annotation attributes (view.metadata.contains and annotation.properties). Then, we vaguely talk about those two different dualities as if they are corresponding to each other by saying things like:

  • When a property is set to some value in the contains dict of the view metadata, then all annotations of that type should adhere to that value.
  • The vocabulary also defines metadata properties. ... This property is not expressed in the annotation but in the metadata of the view.

, when, technically, there's no such correspondence. For example:

  • Developers can put the timeUnit metadata key in the annotation.properties dict, and that's perfectly valid MMIF (of course, this is not a recommended way of doing it, but also not a forbidden way).
  • A text summarization app can generate many new TextDocuments in a single view based on previously existing different TextDocuments. Then, it is impossible to put the document metadata key at the view level (values are not "common" to factor them out).

So, if there are no real differences between the two "places" in practice, I think we should either:

  1. Just use one single place and merge them - This is a no-go since we need that contains dict for quick navigation of MMIF.
  2. Explicitly state that the dual "places" have nothing to do with the dual "types," and leave the selection of a place to put an attribute to the liberty of the developers (which is roughly how things are done currently, see below).
  3. Enforce real differences by explicitly stating that annotation.properties is ONLY FOR property keys (no timeUnit or document in the annotation object) and view.metadata.contains is ONLY FOR metadata keys. Hence, when a group of annotations doesn't share the same metadata value, they should split into different views with different contains dicts.

By the way, I spent many hours in the past (clamsproject/mmif-python#226) to write "magic" code in the SDK to automatically distribute-when-deserialize and factor-when-serialize annotation attributes between contains and properties dicts, in the hope of minimizing human errors. So, what's currently done is:

  1. Regardless of the key, anything common across all annotations (of a single type) is factored out to the contains dict when serialized into JSON - meaning if there's only one TimePoint in a SWT output, all the stuff including classification scores will be thrown into view.metadata.contains.
  2. Regardless of the key, any attributes can be accessed by the .get_property() method when dealing with Annotation instances from the mmif-python SDK.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Todo
Development

No branches or pull requests

2 participants