prevent using of "short" annotation ID and release MMIF 2.0 #228

keighrim · 2024-06-18T18:56:50Z

Because

The generosity of allowing short form for annotation id values provides almost no benefit (besides saving some bytes) while adding lots of technical debt and bugs. For example, we have this code in visualizer

https://github.com/clamsproject/mmif-visualizer/blob/9019c154fccde11a605b77d412ddaca3e7be566c/ocr.py#L65-L68

but source and target values in Alignment annotations almost always use the "long" form (prefixed with view ID) of the annotation ID since lots of alignments are done across views, and hence these checks are destined to fail and introduce silent errors.

I suggest we allow only using long form of the ID in everywhere in MMIF, so that ann.id and ann.long_id always return the same value. This is major change in MMIF spec, hence leads to MMIF 2.0.

Done when

Either

we update all existing code that uses == checks with ann.id value to a fixed standard and educate all existing and future developers to use the same implementation,
we simply disallow "short" form of annotation ID in MMIF world and change ann.id to return ann.long_id.

Additional context

No response

The text was updated successfully, but these errors were encountered:

marcverhagen · 2024-06-18T20:24:07Z

I can see some value in always having view_id:annotation_id, but I do also see value in not stating the obvious inside a view.

However, I really like the idea of having ann.id be a property that returns the full value. As long as that is reasonably efficient then no other changes would be needed I think. If that is true then option 2 seems superior over option 1.

marcverhagen · 2024-06-18T20:35:05Z

Oh, I just realized that what I thought was option 2 is really another option.

keighrim · 2024-06-18T23:05:56Z

If ann.id returns the long form, I believe "id" field of the annotation object in the JSON serialization must have the long form, so that SDK-based ann.id call and general JSON operation ann["id"] behave equivalently.

keighrim · 2024-06-25T15:43:34Z

While working on clamsproject/mmif-python#285, I realized that allowing the short form for annotation id values within the view "scope" actually introduces un-recoverable ambiguity, and hence we MUST stop doing this. Here's an example of output MMIF from an imaginary spell correction app.

{
  ...
  "documents": [
    {
      "@type": "https://mmif.clams.ai/vocabulary/TextDocument/v1/",
      "properties": {
        "id": "td1",
        "location": "file:///var/archive/spell-errors.txt" }
    }
  ],
  "views": [

    {
      "id": "v1",
      "metadata": {
        "app": "http://apps.clams.ai/awesome-spell-corrector/v7", 
        ...
      },
      "annotations": [
        {
          "@type": "https://mmif.clams.ai/vocabulary/TextDocument/v1/",
          "properties": {
            "id": "td1",
            "text": { "@value": "very correctly spelled text" }
          }
        },
        {
          "@type": "https://mmif.clams.ai/vocabulary/Alignment/v1/",
          "properties": {
            "id": "a1",
            "source": "td1", 
            "target": "td1"   # ???
          }
        },
        ...
      ]
    }
  ]
}

marcverhagen · 2024-06-26T01:54:29Z

Yes, that does appear to be an oversight since we nowhere stated that annotation identifiers should not be the same as identifiers in the documents list.

…list

keighrim · 2024-07-12T21:57:14Z

Last time I and @marcverhagen talked about this in a in-person meeting, we temporarily decided to update the SDK code to use "long_id" everywhere, but keep the MMIF version 1.x. That seemed to work with MMIF spec version. But now that I think of mmif-python version, and because the mmif-python SDK version is "bound" to the spec version down to minor level (first two ints) , I don't think keeping the MMIF version to 1.x will work at all.

If we update the SDK to use long_id everywhere, the new SDK won't be able to read existing mmif files. Hence, to mark the boundary between that compatibility I firmly believe that we MUST release mmif-python 2.x. But that can't be done without releasing MMIF 2.x, because of the version coupling. This problem already raised three years ago (#14 (comment)) but never properly addressed back then.

keighrim mentioned this issue Jun 24, 2024

output MMIF format clamsproject/app-role-filler-binder#1

Open

keighrim mentioned this issue Jun 26, 2024

initial implementation with clams-python 1.2.4 clamsproject/app-tfidf-keywordextractor#3

Merged

marcverhagen mentioned this issue Jun 28, 2024

List of issues with the current specifications #230

Open

2 tasks

keighrim added a commit to clamsproject/mmif-python that referenced this issue Jul 12, 2024

small, temporary workaround of clamsproject/mmif#228 for annotations …

74e5400

…list

keighrim added a commit to clamsproject/aapb-evaluations that referenced this issue Jul 12, 2024

soft-pinning mmif SDK version (see clamsproject/mmif#228 (comment))

7c89ae4

This was referenced Jul 13, 2024

reconsider version binding with MMIF spec clamsproject/mmif-python#293

Open

Update mmif.serialize.annotation.Annotation:id to always use "long" form clamsproject/mmif-python#296

Open

keighrim mentioned this issue Aug 14, 2024

promote annotations' id field #233

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

prevent using of "short" annotation ID and release MMIF 2.0 #228

prevent using of "short" annotation ID and release MMIF 2.0 #228

keighrim commented Jun 18, 2024 •

edited

Loading

marcverhagen commented Jun 18, 2024

marcverhagen commented Jun 18, 2024

keighrim commented Jun 18, 2024

keighrim commented Jun 25, 2024 •

edited

Loading

marcverhagen commented Jun 26, 2024

keighrim commented Jul 12, 2024

prevent using of "short" annotation ID and release MMIF 2.0 #228

prevent using of "short" annotation ID and release MMIF 2.0 #228

Comments

keighrim commented Jun 18, 2024 • edited Loading

Because

Done when

Additional context

marcverhagen commented Jun 18, 2024

marcverhagen commented Jun 18, 2024

keighrim commented Jun 18, 2024

keighrim commented Jun 25, 2024 • edited Loading

marcverhagen commented Jun 26, 2024

keighrim commented Jul 12, 2024

keighrim commented Jun 18, 2024 •

edited

Loading

keighrim commented Jun 25, 2024 •

edited

Loading