Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improved alignment "caching" #285

Closed
keighrim opened this issue Jun 11, 2024 · 0 comments · Fixed by #287
Closed

improved alignment "caching" #285

keighrim opened this issue Jun 11, 2024 · 0 comments · Fixed by #287
Labels
✨N New feature or request

Comments

@keighrim
Copy link
Member

New Feature Summary

As a app developer, I would like to have the superpower to navigate Alignment annotations and their arguments (target, source) more easily, instead of writing complicated graph building code every time I have to deal with alignments like

implementation proposal

Add a hidden (ignored-when-serialized) attribute to each mmif.serialize.annotation.Annotation instance to hold a dict from Alignment object long_id to the counter part object long_id. For example, supposed that we have three annotation objects in a MMIF

{
  "metadata": {
    "mmif": "http://mmif.clams.ai/1.0.0"
  },
  "views": [
    {
      "id": "v_0",
      "metadata": {
        "timestamp": "2023-07-25T19:55:56.693173",
        "app": "http://apps.clams.ai/whisper-wrapper/v3",
        "contains": {
          "http://vocab.lappsgrid.org/Token": {},
          "http://mmif.clams.ai/vocabulary/TimeFrame/v1": {}, 
          "http://mmif.clams.ai/vocabulary/Alignment/v1": {},
          ...
        }
      },
      "annotations": [
        {
          "@type": "http://mmif.clams.ai/vocabulary/TextDocument/v1",
          ...
        },
        ...
        {
          "@type": "http://vocab.lappsgrid.org/Token",
          "properties": {
            "word": " Good",
            "start": 0,
            "end": 5,
            "document": "v_0:td_1",
            "id": "to_1"
          }
        },
        {
          "@type": "http://mmif.clams.ai/vocabulary/TimeFrame/v1",
          "properties": {
            "frameType": "speech",
            "start": 98.52,
            "end": 99.12,
            "id": "tf_1"
          }
        },
        {
          "@type": "http://mmif.clams.ai/vocabulary/Alignment/v1",
          "properties": {
            "source": "tf_1",
            "target": "to_1",
            "id": "al_2"
          }
        }, 
        ...
      ]
    }
  ]
}

When de-serialized, the annotation object will have hidden attributes to "cache" aligned annotation ids.

>>> m = Mmif(mmif_str)
>>> m.get('v_0:to_1')._alignments
{"v_0:al_2": "v_0:tf_1"}
>>> m.get('v_0:tf_1')._alignments
{"v_0:al_2": "v_0:to_1"}
>>> m.get('v_0:al_1')._alignments
{}  # unless alignment annotation itself is aligned to other annotations

If there are multiple alignments to an annotation, the dict simply needs to be extended with more. Updating the _alignments attribute (tentative name) should be added to

  1. MMIF de-serialization (
    def _deserialize(self, input_dict: dict) -> None:
    )
  2. View::add_annotation method (
    def add_annotation(self, annotation: 'Annotation', overwrite=False) -> 'Annotation':
    )
  3. View::new_annotation method (
    def new_annotation(self, at_type: Union[str, ThingTypesBase], aid: Optional[str] = None,
    )

limitation

Currently View object doesn't have access to its "parent" Mmif object, so getting long_ids of annotation object might not be trivial, unless we add that pointer to the parent in the View class.

Related

No response

Alternatives

No response

Additional context

No response

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
✨N New feature or request
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

1 participant