-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SR point-wise evaluation with measuring "stitcher" performance #60
Comments
Some of this was done in 22bee5c (work in progress that I just pushed up to make it visible), but not quite in the way as described above. One difference is that it uses the output of the updated process.py script from the annotations repository. |
Is there overlap here with #43? |
The evaluation scheme in this is based on point-wise evaluation, hence is not compatibly with "old" interval-level gold data (from 2020 time). I don't think this is a duplicate of #43. Eventually, I believe the proposed method will be evaluation for stitcher components, largely independent from image classification model performance. I think the most "overlapping" effort to this issue was
so I thought it's not easy to verify if the old code works or not. (plus, this repo is the repo for evaluation code) That led me to start this new issue. Footnotes
|
Since the existing timepoint evaluation and the proposed timeframe (stitcher) evaluation can be applied to any point-wise classification tasks, I think it'd be more representing if we rename the subdirectory name to |
To analyze stitcher's performance, I compared the evaluation scores from
For Across all labels (besides Aside from higher
When
From all of my observations, I have concluded that the best scores result from the following configuration:
Footnotes
|
For future references, the "result" files used for this gridsearch/evaluation; |
A few follow up questions;
And future directions;
|
241101 experiment with raw labelsExperiment setupThe goal of this experiment is
Input data
TimePoint annotations
TimeFrame annotations
"binning"
ResultsEvaluation method
Result files
Analysis
When looking at the all labelsStitcher contribution by all labels
Labels showed positive contribution
Point-wise softmax threshold vs stitcher contribution
Conclusion
In general, stitching works as intended (82.6% of cases). Using When looking at labels in "relaxed" binning scheme
Stitcher contribution by all labels
Average of all "interested" labels
By bins
Minimum TF duration threshold vs stitcher contribution
ConclusionAs shown in the previous pilot report by @kla7,
5 sec threshold wasn't that too long for chyron types. Conclusion and next stepsI will fix |
241102 experiment with "relaxed" postbinExperiment setupThis is an experiment following up the findings from Input data
TimeFrame annotations
"binning"
ResultsEvaluation method
Result files
AnalysisStitcher contribution by all bins
Softmax aggregation method vs stitcher contribution
Frame-wise score (average) threshold
Smoothing negative "noises"
Lastly, allowing frame overlap
Conclusion
|
New Feature Summary
#55 added an evaluation software for apps like SWT only using
TimePoint
annotations, but that evaluation can easily expanded to evaluation of "stitcher" component that turnsTP
annotations intoTimeFrame
annotations. The idea is based on that all existing stitcher implementation is using "label remapping", exposed as a runtime param and recorded in the view metadata, that enables us to re-construct point-wise but remapped label value list.So the idea is to update
eval.py
file so that itTP
annotation as usual, constructing a list of "raw" classification results, let's call itraw
gold
TimeFrame
annotation with the remapper config (map
for SWT built-in,labelMap
for simple-stitcher)raw
andgold
into secondary remapped lists (these new lists should be shorter then the original ones, since not all of the raw/gold labels are remapped into the secondary (TF
) labels. Let's call themraw-remap
andgold-remap
respectively.TF
annotations, construct a third list of stitched, remapped labels by using the pointers intargets
prop (which must be pointing toTP
annotations so the timepoints can be traced), let's call this liststitched
raw
vs.gold
(this should already be there in the current eval.py)raw-remap
vs.gold-remap
stitched
vs.gold-remap
,Related
resolving this issue will also properly address clamsproject/app-swt-detection#61
Alternatives
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: