Skip to content

Commit

Permalink
generated 1.0.1 files
Browse files Browse the repository at this point in the history
  • Loading branch information
keighrim committed Feb 7, 2024
1 parent b0ce38c commit 7ec7f2a
Show file tree
Hide file tree
Showing 46 changed files with 5,633 additions and 34 deletions.
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,10 @@ The format is loosely based on [Keep a Changelog](http://keepachangelog.com/). L

This file documents changes made to the MMIF specification. Version names used to start with `spec-` because the Python MMIF SDK was also maintained in this repository. Starting with version 0.2.2 the repository was split and the prefix was discarded.

## Version 1.0.1 - 2024-02-07
- vocabulary types now have `similarTo` field to link similar type definitions as URI (https://github.com/clamsproject/mmif/issues/203).
- updated `TimeFrame` definition to ease `frameType` value restrictions (https://github.com/clamsproject/mmif/issues/207).

## Version 1.0.0 - 2023-05-26

- Re-release of 0.5.0 (our last release candidate) as 1.0.0 stable version.
Expand Down
2 changes: 1 addition & 1 deletion VERSION
Original file line number Diff line number Diff line change
@@ -1 +1 @@
1.0.0
1.0.1
551 changes: 551 additions & 0 deletions docs/1.0.1/index.md

Large diffs are not rendered by default.

Binary file added docs/1.0.1/pi78oGjdT-annotated.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/1.0.1/pi78oGjdT.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
33 changes: 33 additions & 0 deletions docs/1.0.1/samples/bars-tones-slates/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
---
layout: page
title: MMIF Specification
subtitle: Version 1.0.1
---

# Example: Bars and Tones and Slates

To see the full example scroll down to the end or open the [raw json file](raw.json).

This is a minimal example that contains two media documents, one pointing at a video and the other at a transcript. For the first document there are two views, one with bars-and-tone annotations and one with slate annotations. For the second document there is one view with the results of a tokenizer. This example file, while minimal, has everything required by MMIF.

Some notes:

- The metadata just specify the MMIF version.
- Both media documents in the *documents* list refer to a location on a local disk or a mounted disk. If this document is not on a local disk or mounted disk then URLs should be used.
- Each view has some metadata spelling out several kinds of things:
- The application that created the view.
- A timestamp of when the view was created.
- What kind of annotations are in the view and what metadata are there on those annotations (for example, in the view with id=v2, the *contains* field has a property "http://mmif.clams.ai/vocabulary/TimeFrame/v2" with a dictionary as the value and that dictionary contains the metadata. Here the metadata specify what document the annotations are over what unit is used for annotation offsets.

Only one annotation is shown for each view, this is to keep the file as small as possible. Of course, often the bars-and-tones and slate views often have only one annotation so it is likely only the tokens view where annotations were left out.



## Full MMIF File

```json
{% include_relative raw.json %}
```



96 changes: 96 additions & 0 deletions docs/1.0.1/samples/bars-tones-slates/raw.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,96 @@
{
"metadata": {
"mmif": "http://mmif.clams.ai/1.0.1"
},
"documents": [
{
"@type": "http://mmif.clams.ai/vocabulary/VideoDocument/v1",
"properties": {
"id": "m1",
"mime": "video/mp4",
"location": "file:///var/archive/video-0012.mp4"
}
},
{
"@type": "http://mmif.clams.ai/vocabulary/TextDocument/v1",
"properties": {
"id": "m2",
"mime": "text/plain",
"location": "file:///var/archive/video-0012-transcript.txt"
}
}
],
"views": [
{
"id": "v1",
"metadata": {
"app": "http://apps.clams.ai/bars-and-tones/1.0.5",
"timestamp": "2020-05-27T12:23:45",
"contains": {
"http://mmif.clams.ai/vocabulary/TimeFrame/v2": {
"document": "m1",
"timeUnit": "seconds"
}
}
},
"annotations": [
{
"@type": "http://mmif.clams.ai/vocabulary/TimeFrame/v2",
"properties": {
"id": "s1",
"start": 0,
"end": 5,
"frameType": "bars-and-tones"
}
}
]
},
{
"id": "v2",
"metadata": {
"app": "http://apps.clams.ai/slates/1.0.3",
"timestamp": "2020-05-27T12:23:45",
"contains": {
"http://mmif.clams.ai/vocabulary/TimeFrame/v2": {
"document": "m1",
"timeUnit": "seconds"
}
}
},
"annotations": [
{
"@type": "http://mmif.clams.ai/vocabulary/TimeFrame/v2",
"properties": {
"id": "s1",
"start": 25,
"end": 38,
"frameType": "slate"
}
}
]
},
{
"id": "v3",
"metadata": {
"app": "http://apps.clams.ai/spacy/1.3.0",
"timestamp": "2020-05-27T12:25:15",
"contains": {
"http://vocab.lappsgrid.org/Token": {
"document": "m2"
}
}
},
"annotations": [
{
"@type": "http://vocab.lappsgrid.org/Token",
"properties": {
"id": "s1",
"start": 0,
"end": 3,
"word": "The"
}
}
]
}
]
}
177 changes: 177 additions & 0 deletions docs/1.0.1/samples/east-tesseract-typing/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,177 @@
---
layout: page
title: MMIF Specification
subtitle: Version 1.0.1
---



# Example: EAST, Tesseract and Typing

This example contains one image document which points to this image:

<img src="../../pi78oGjdT.jpg" border="1" height="200"/>

In addition, there are three views, one created by EAST, one by Tesseract and one by a semantic typing component. We now give fragments of the four relevant parts of the MMIF file, each with some comments.

To see the full example scroll down to the end or open the [raw json file](raw.json).

### Fragment 1: the documents list

```json
{
"documents": [
{
"@type": "http://mmif.clams.ai/vocabulary/ImageDocument/v1",
"properties": {
"id": "m1",
"mime": "image/jpg",
"location": "/var/archive/image-fido-barks.jpg" }
}
]
}
```
This is simply just a list with one *ImageDocument* which points at the file with the barking dog image.

### Fragment 2: the EAST view

Here are the metadata in this view:

```json
{
"app": "http://mmif.clams.ai/apps/east/0.2.1",
"contains": {
"http://mmif.clams.ai/1.0.1/BoundingBox": {
"timeUnit": "pixels",
"document": "m1" } }
}
```

It simply says that EAST created the view and that all bounding box annotations are over document "m1" using pixels as the unit.

And here is the annotations list:


```json
[
{
"@type": "http://mmif.clams.ai/vocabulary/BoundingBox/v1",
"properties": {
"id": "bb1",
"coordinates": [[10,20], [40,20], [10,30], [40,30]],
"boxType": "text" }
},
{
"@type": "http://mmif.clams.ai/vocabulary/BoundingBox/v1",
"properties": {
"id": "bb2",
"coordinates": [[210,220], [240,220], [210,230], [240,230]],
"boxType": "text" }
}
]
```

EAST has found two text boxes: one for "Arf" and one for "yelp" (although after EAST runs we do not know yet what the actual text is). Text boxes are encoded simply by specifying what the type of the bounding box is. For the sake of a somewhat smaller example file we are assuming here that EAST does not find text boxes when the text slants down. Note also that the coordinates are made up and bear little relation to what the real coordinates are.

### Fragment 3: the Tesseract view

Metadata:

```json
{
"app": "http://mmif.clams.ai/apps/tesseract/0.2.1",
"contains": {
"http://mmif.clams.ai/0.1.0/vocabulary/TextDocument": {},
"http://mmif.clams.ai/0.1.0/vocabulary/Alignment": {} }
}
```

Tesseract creates text documents from bounding boxes with type equal to "text" and creates alignment relations between the documents and the boxes. The interesting thing here is compared to the metadata for the view created by EAST is that here no *document* metadata property is defined. This is because neither *TextDocument* nor *Alignment* need to be directly anchored into a document.

Annotations list:

```json
[
{
"@type": "http://mmif.clams.ai/0.1.0/vocabulary/TextDocument",
"properties": {
"id": "td1",
"text": {
"@value": "Arf" } }
},
{
"@type": "http://mmif.clams.ai/0.1.0/vocabulary/Alignment",
"properties": {
"id": "a1",
"source": "v1:bb1",
"target": "td1" }
},
{
"@type": "http://mmif.clams.ai/vocabulary/TextDocument/v1",
"properties": {
"id": "td2",
"text": {
"@value": "yelp" } }
},
{
"@type": "http://mmif.clams.ai/vocabulary/Alignment/v1",
"properties": {
"id": "a2",
"source": "v1:bb2",
"target": "td2" }
}
]
```

The text documents just have identifiers and store the text, they themselves are not aware of where they came from. The alignments link the text documents to bounding boxes in the view created by EAST.

### Fragment 4: the Semantic Typer view

Metadata:

```json
{
"app": "http://mmif.clams.ai/apps/semantic-typer/0.2.4",
"contains": {
"http://mmif.clams.ai/vocabulary/SemanticTag/v1": {} },
}

```

Nothing spectacular here. Like the previous view no *document* property is used, but in this case it is because the semantic tags in the annotation list each refer to a different document.

Annotations list:

```json
[
{
"@type": "http://mmif.clams.ai/vocabulary/SemanticTag/v1",
"properties": {
"id": "st1",
"category": "dog-sound",
"document": "V2:td1",
"start": 0,
"end": 4 }
},
{
"@type": "http://mmif.clams.ai/vocabulary/SemanticTag/v1",
"properties": {
"id": "st2",
"category": "dog-sound",
"document": "V2:td2",
"start": 0,
"end": 4 }
}
]
```

Now each annotation needs to have its own *document* property so we know what documents each semantic tag is anchored to.



## Full MMIF File

```json
{% include_relative raw.json %}
```

Loading

0 comments on commit 7ec7f2a

Please sign in to comment.