Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding image classifications #41

Closed
marcverhagen opened this issue Dec 13, 2023 · 9 comments
Closed

Adding image classifications #41

marcverhagen opened this issue Dec 13, 2023 · 9 comments
Labels
✨N New feature or request

Comments

@marcverhagen
Copy link
Contributor

marcverhagen commented Dec 13, 2023

New Feature Summary

Not important, but it has popped up in my mind several times.

The question is whether it is worthwhile to also save the frame classifications in the MMIF file. It would take some space, but it might be worth it if there is downstream use.

We would need to think about whether this requires updates in the vocabulary.

@marcverhagen marcverhagen added the ✨N New feature or request label Dec 13, 2023
@clams-bot clams-bot added this to apps Dec 13, 2023
@github-project-automation github-project-automation bot moved this to Todo in apps Dec 13, 2023
@marcverhagen
Copy link
Contributor Author

One reason why it could be useful to keep at least the predictions on those still frames included in the TimeFrames is that it may help us pick the best frames from a TimeFrame.

@owencking
Copy link
Collaborator

I agree that this is probably not high priority, especially if it requires updating the MMIF vocabulary.

However, as Marc said, I think that one benefit is that it increases the chances of being able to grab one or two of the most representative still frames from a labeled duration.

Another approach that might provide this benefit might be to add an additional attribute to the time-based annotation -- some indication of a particular instant that was representative of the annotated period. For example, if we have a time period of 46s to 73s labeled as "slate", we might have an additional piece of info in that annotation telling us that a highly representative frame occurred at 54000ms. The choice of that frame could be based on the CV-based frame classifications.

But, again, I don't think this is high priority. Much more in the category of "potentially nice to have".

@marcverhagen
Copy link
Contributor Author

In the course of doing the non-MMIF approach to the SWT still frame evaluation it may actually be very useful to have this since we would be evaluating the still frame predictions, not the timeframe predictions.

As for MMIF, since we allow extra properties it would not be illegal to do add what we need to any annotation type. So we could have representative_frames on a TimeFrame annotation:

{
    "@type": "https://mmif.clams.ai/vocabulary/TimeFrame/v1/",
    "properties": {
        "id": "tf1",
        "start": 5000,
        "end": 12000
        "frameType": "slate",
        "representative_frames": [6000, 7000, 10000] }
}

Or a time point with labels

{
    "@type": "https://mmif.clams.ai/vocabulary/TimePoint/v1/",
    "properties": {
        "id": "tp1",
        "timePoint": 6000,
        "swt-label": "slate",
        "score": 0.9856 }
}

Or even label scores:

{
    "@type": "https://mmif.clams.ai/vocabulary/TimePoint/v1/",
    "properties": {
        "id": "tp1",
        "timePoint": 6000,
        "slate-score": 0.9856,
        "chyron-score": 0.0144,
        "credits-score": 0.0000 }
}

I am not suggesting any of these, the point is that we can do that if we want. And I do intend to experiment with that for the SWT output.

@keighrim
Copy link
Member

keighrim commented Feb 2, 2024

This, and also related to #52,

From @marcverhagen 's email on 1/30/24 regarding MMIF representation as of v3 (working version)


Just to illustrate, here is a TimeFrame from SWT:

{
  "@type": "http://mmif.clams.ai/vocabulary/TimeFrame/v1",
  "properties": {
    "start": 2000,
    "end": 6000,
    "frameType": "chyron",
    "score": 0.9963429927825928,
    "scores": [
      0.9952167272567749,
      0.99429851770401,
      0.9968003034591675,
      0.9958693385124207,
      0.9995300769805908
    ],
    "points": [
      2000,
      3000,
      4000,
      5000,
      6000
    ],
    "representatives": [
      6000
    ],
    "id": "tf_1"
  }
}

It is somewhat ad hoc and I cannot say I like it, but it does have the information you want I think.


Then, after clamsproject/app-easyocr-wrapper#2 is raised, we discussed using TimePoints as "raw" anchors for image classification labels, and then add TimeFrame on top of "stitched" time-points, with targets property to hold id's of the time-points under the interval. That would be something like;

"annotations": [
  { 
    "@type": "http://mmif.clams.ai/vocabulary/TimePoint/v1",
    "properties": { "id": "tp1", "point": 2000, "labels": {???}   # internal structure to store probabilities by labels 
  },
  { 
    "@type": "http://mmif.clams.ai/vocabulary/TimePoint/v1",
    "properties": { "id": "tp2", "point": 3000, "labels": {???}
  },
  { 
    "@type": "http://mmif.clams.ai/vocabulary/TimePoint/v1",
    "properties": { "id": "tp3", "point": 4000, "labels": {???}
  },
  { 
    "@type": "http://mmif.clams.ai/vocabulary/TimePoint/v1",
    "properties": { "id": "tp4", "point": 5000, "labels": {???}
  },
  { 
    "@type": "http://mmif.clams.ai/vocabulary/TimePoint/v1",
    "properties": { "id": "tp5", "point": 6000, "labels": {???}
  },
  { 
    "@type": "http://mmif.clams.ai/vocabulary/TimeFrame/v1",
    "properties": { 
      "id": "tf_1",
      "targets": [ "tp1",  "tp2",  "tp3",  "tp4",  "tp5" ],
      "representatives": [ "tp5" ],
      ... # and possibly more props
    }
  } 
 ... # and more points and frames
]

For the labels property, I believe more discussion is needed, but from yesterday's discussion followings are suggested

# two-array representation
{ 
  "@type": "http://mmif.clams.ai/vocabulary/TimePoint/v1",
  "properties": { 
    "id": "tp1", 
    "point": 2000, 
    "labels": ["bar", "slate", "chyron", "credit", "NEG"], 
    "scores": [0.1, 0.2, 0.3, 0.4, 0.5] 
}
# pros: "labels" field can be _factored_ into `view.metadata.contains` to save some bytes
# cons: sacrificing readability 
# object (dict) representation
{ 
  "@type": "http://mmif.clams.ai/vocabulary/TimePoint/v1",
  "properties": { 
    "id": "tp1", 
    "point": 2000, 
    "labels": {"bar": 0.1, "slate": 0.2, "chyron": 0.3, "credit": 0.4, "NEG": 0.5}
}
# pros: much more readable
# cons: we've never used properties with this "nested" objects, and the current MMIF specification is quite vaguely written in that we couldn't decided whether this is actually allowed or not. 

@keighrim
Copy link
Member

keighrim commented Feb 5, 2024

moving #62 (comment)

Did much of this in eaa6522. But due to limitations imposed by mmif-python (see clamsproject/mmif-python#252), the classification for timepoints looks like

"classification": [
    "slate:4.1978837543865666e-05",
    "chyron:0.9895544052124023",
    "credit:0.0007274810341186821",
    "NEG:0.009676150046288967"
]

Not sure what is worse, the above or

"labels": ["slate", "chyron", "credit", "NEG"],
"scores": [4.1978837543865666e-05, 0.9895544052124023, 0.0007274810341186821,  0.009676150046288967]
"classification": [
    "slate:4.1978837543865666e-05",
    "chyron:0.9895544052124023",
    "credit:0.0007274810341186821",
    "NEG:0.009676150046288967"
]

I think this is not reasonable nor responsible implementation, and will cause lots of problems for downstream apps since it's dumping complex data types into string without any clear specification or instructions for parsing the strings, then eventually dumping off outrageous amount of responsibility to other developers.

I actually have no issue with the two-lists representation, but if the dictional representation is really crucial for this, I think we should either

  1. wait for the updates in mmif-python (and clams-python) to support,
  2. add such data structures working around the helper functions (e.g., direct json manipulation)

@marcverhagen
Copy link
Contributor Author

Agreed, two lists is better than one list with strings that pack more complex objects. There is a bit of a precedent there with identifiers that are a concatenation of view id plus annotation id, but this takes that stuff to a higher level of ad hoc.

@marcverhagen
Copy link
Contributor Author

We do still have two lists that are dependent on each other which also saddles the downstream developer with doing extra work which is perhaps as bad as unpacking the serialized label-score pairs. This would be too random for helper functions to deal with, so I think we may be waiting for an updated mmif-python and or clams-python.

@keighrim
Copy link
Member

keighrim commented Feb 6, 2024

With the dict-based representation of the classifications scores, I wonder what would be the best way to specify that dict structure in the app metadata. (related to clamsproject/clams-python#194)

Metadata spec updated in #63 specifying possible frameType values as follow;

metadata.add_input(DocumentTypes.VideoDocument, required=True)
metadata.add_output(AnnotationTypes.TimeFrame, frameType='bars')
metadata.add_output(AnnotationTypes.TimeFrame, frameType='slate')
metadata.add_output(AnnotationTypes.TimeFrame, frameType='chyron')
metadata.add_output(AnnotationTypes.TimeFrame, frameType='credits')

And similar practice can be done with addition of TimePoint at_type.

metadata.add_input(DocumentTypes.VideoDocument, required=True) 
metadata.add_output(AnnotationTypes.TimePoint, label='bars', timeUnit='milliseconds') 
metadata.add_output(AnnotationTypes.TimePoint, label='slate', timeUnit='milliseconds') 
metadata.add_output(AnnotationTypes.TimePoint, label='chyron', timeUnit='milliseconds') 
metadata.add_output(AnnotationTypes.TimePoint, label='credits', timeUnit='milliseconds') 
# not sure what the value of `label` prop will be when the score for NEG is the top
metadata.add_output(AnnotationTypes.TimeFrame, frameType='bars', timeUnit='milliseconds') 
metadata.add_output(AnnotationTypes.TimeFrame, frameType='slate', timeUnit='milliseconds') 
metadata.add_output(AnnotationTypes.TimeFrame, frameType='chyron', timeUnit='milliseconds') 
metadata.add_output(AnnotationTypes.TimeFrame, frameType='credits', timeUnit='milliseconds') 

But for classifications property as a dict with a fixed set of keys, I'm a little lost how we add it to the output specification in AppMetadata.

Related to that, previously, there were discussion on specifying data type, instead of data values for output specs, but it was never implemented in the SDK. But even with the type-level specification, I don't think there is an easy representation for this complex data types for at_type properties.

Note that these I/O specs are currently under-implemented, but we hope in the future, they will be used for type coercion in workflow engines. So having clear design should be critical piece of work for the such future.

In addition to that, I've planned for a while for using these I/O spec for searching in AppDir to improve AppDir user experience (clamsproject/apps#59).

@keighrim
Copy link
Member

keighrim commented Mar 7, 2024

fixed via #83.

@keighrim keighrim closed this as completed Mar 7, 2024
@github-project-automation github-project-automation bot moved this from Todo to Done in apps Mar 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
✨N New feature or request
Projects
Archived in project
Development

No branches or pull requests

3 participants