-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding image classifications #41
Comments
One reason why it could be useful to keep at least the predictions on those still frames included in the TimeFrames is that it may help us pick the best frames from a TimeFrame. |
I agree that this is probably not high priority, especially if it requires updating the MMIF vocabulary. However, as Marc said, I think that one benefit is that it increases the chances of being able to grab one or two of the most representative still frames from a labeled duration. Another approach that might provide this benefit might be to add an additional attribute to the time-based annotation -- some indication of a particular instant that was representative of the annotated period. For example, if we have a time period of 46s to 73s labeled as "slate", we might have an additional piece of info in that annotation telling us that a highly representative frame occurred at 54000ms. The choice of that frame could be based on the CV-based frame classifications. But, again, I don't think this is high priority. Much more in the category of "potentially nice to have". |
In the course of doing the non-MMIF approach to the SWT still frame evaluation it may actually be very useful to have this since we would be evaluating the still frame predictions, not the timeframe predictions. As for MMIF, since we allow extra properties it would not be illegal to do add what we need to any annotation type. So we could have {
"@type": "https://mmif.clams.ai/vocabulary/TimeFrame/v1/",
"properties": {
"id": "tf1",
"start": 5000,
"end": 12000
"frameType": "slate",
"representative_frames": [6000, 7000, 10000] }
} Or a time point with labels {
"@type": "https://mmif.clams.ai/vocabulary/TimePoint/v1/",
"properties": {
"id": "tp1",
"timePoint": 6000,
"swt-label": "slate",
"score": 0.9856 }
} Or even label scores: {
"@type": "https://mmif.clams.ai/vocabulary/TimePoint/v1/",
"properties": {
"id": "tp1",
"timePoint": 6000,
"slate-score": 0.9856,
"chyron-score": 0.0144,
"credits-score": 0.0000 }
} I am not suggesting any of these, the point is that we can do that if we want. And I do intend to experiment with that for the SWT output. |
This, and also related to #52, From @marcverhagen 's email on 1/30/24 regarding MMIF representation as of v3 (working version)
Then, after clamsproject/app-easyocr-wrapper#2 is raised, we discussed using "annotations": [
{
"@type": "http://mmif.clams.ai/vocabulary/TimePoint/v1",
"properties": { "id": "tp1", "point": 2000, "labels": {???} # internal structure to store probabilities by labels
},
{
"@type": "http://mmif.clams.ai/vocabulary/TimePoint/v1",
"properties": { "id": "tp2", "point": 3000, "labels": {???}
},
{
"@type": "http://mmif.clams.ai/vocabulary/TimePoint/v1",
"properties": { "id": "tp3", "point": 4000, "labels": {???}
},
{
"@type": "http://mmif.clams.ai/vocabulary/TimePoint/v1",
"properties": { "id": "tp4", "point": 5000, "labels": {???}
},
{
"@type": "http://mmif.clams.ai/vocabulary/TimePoint/v1",
"properties": { "id": "tp5", "point": 6000, "labels": {???}
},
{
"@type": "http://mmif.clams.ai/vocabulary/TimeFrame/v1",
"properties": {
"id": "tf_1",
"targets": [ "tp1", "tp2", "tp3", "tp4", "tp5" ],
"representatives": [ "tp5" ],
... # and possibly more props
}
}
... # and more points and frames
] For the # two-array representation
{
"@type": "http://mmif.clams.ai/vocabulary/TimePoint/v1",
"properties": {
"id": "tp1",
"point": 2000,
"labels": ["bar", "slate", "chyron", "credit", "NEG"],
"scores": [0.1, 0.2, 0.3, 0.4, 0.5]
}
# pros: "labels" field can be _factored_ into `view.metadata.contains` to save some bytes
# cons: sacrificing readability # object (dict) representation
{
"@type": "http://mmif.clams.ai/vocabulary/TimePoint/v1",
"properties": {
"id": "tp1",
"point": 2000,
"labels": {"bar": 0.1, "slate": 0.2, "chyron": 0.3, "credit": 0.4, "NEG": 0.5}
}
# pros: much more readable
# cons: we've never used properties with this "nested" objects, and the current MMIF specification is quite vaguely written in that we couldn't decided whether this is actually allowed or not. |
moving #62 (comment)
"classification": [
"slate:4.1978837543865666e-05",
"chyron:0.9895544052124023",
"credit:0.0007274810341186821",
"NEG:0.009676150046288967"
] I think this is not reasonable nor responsible implementation, and will cause lots of problems for downstream apps since it's dumping complex data types into string without any clear specification or instructions for parsing the strings, then eventually dumping off outrageous amount of responsibility to other developers. I actually have no issue with the two-lists representation, but if the dictional representation is really crucial for this, I think we should either
|
Agreed, two lists is better than one list with strings that pack more complex objects. There is a bit of a precedent there with identifiers that are a concatenation of view id plus annotation id, but this takes that stuff to a higher level of ad hoc. |
We do still have two lists that are dependent on each other which also saddles the downstream developer with doing extra work which is perhaps as bad as unpacking the serialized label-score pairs. This would be too random for helper functions to deal with, so I think we may be waiting for an updated mmif-python and or clams-python. |
With the dict-based representation of the Metadata spec updated in #63 specifying possible Lines 29 to 35 in 61aba22
And similar practice can be done with addition of metadata.add_input(DocumentTypes.VideoDocument, required=True)
metadata.add_output(AnnotationTypes.TimePoint, label='bars', timeUnit='milliseconds')
metadata.add_output(AnnotationTypes.TimePoint, label='slate', timeUnit='milliseconds')
metadata.add_output(AnnotationTypes.TimePoint, label='chyron', timeUnit='milliseconds')
metadata.add_output(AnnotationTypes.TimePoint, label='credits', timeUnit='milliseconds')
# not sure what the value of `label` prop will be when the score for NEG is the top
metadata.add_output(AnnotationTypes.TimeFrame, frameType='bars', timeUnit='milliseconds')
metadata.add_output(AnnotationTypes.TimeFrame, frameType='slate', timeUnit='milliseconds')
metadata.add_output(AnnotationTypes.TimeFrame, frameType='chyron', timeUnit='milliseconds')
metadata.add_output(AnnotationTypes.TimeFrame, frameType='credits', timeUnit='milliseconds') But for Related to that, previously, there were discussion on specifying data type, instead of data values for output specs, but it was never implemented in the SDK. But even with the type-level specification, I don't think there is an easy representation for this complex data types for at_type properties. Note that these I/O specs are currently under-implemented, but we hope in the future, they will be used for type coercion in workflow engines. So having clear design should be critical piece of work for the such future. In addition to that, I've planned for a while for using these I/O spec for searching in AppDir to improve AppDir user experience (clamsproject/apps#59). |
fixed via #83. |
New Feature Summary
Not important, but it has popped up in my mind several times.
The question is whether it is worthwhile to also save the frame classifications in the MMIF file. It would take some space, but it might be worth it if there is downstream use.
We would need to think about whether this requires updates in the vocabulary.
The text was updated successfully, but these errors were encountered: