-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix hardcoded 94 min limit on positional encoding #100
Comments
So after recent discussion, we decided to first try to implement the first "hybrid" approach.
|
Wow, this is helpful way of analyzing the results. From our domain knowledge, I suspect the labels that are most impacted by the positional information would be slates (
|
Here are the F1-score results for labels S and C compiled in spreadsheets after performing gridsearch previously. Label S resultsFrom the image above, it seems that some of the highest F1-scores result for label S when Here are the plots for the above-mentioned configurations retrieved from running see_results.py on label S:
Label C resultsFrom the image above, it seems that some of the highest F1-scores result for label C when
|
Using fixed values found from the previous observations for the three hyperparameters
I once again used Label B resultsThe results seem about equal when using Label C resultsWhile these numbers are quite low on their own, it is interesting to note that the recall for Label I resultsFor label I, it seems that using Label S resultsAgain, these numbers are quite low to begin with and there is no significant difference, but it is interesting to note that |
So it looks like our hypothesis proves most true here, except that positional encoding "hurts" prediction performance for |
Maybe for the upcoming rounds of experiment, we can also try to see the impact of |
The f1-scores worry me a bit since some of them seem to be very close to the lowest of precision and recall. For example for label S we have P=0.7083 and R=0.5910, but F=0.5964, where ny back-of-the-napkin calculations has it at 0.64, which intuitively makes more sense to me. In one case, label C, it is even below the lowest of P&R. |
The line of code to retrieve the relative position was incorrect, so I have altered it and re-ran gridsearch using the same hyperparameters as in #100 (comment). The following line app-swt-detection/modeling/data_loader.py Line 141 in 43cc4d5
was changed to be pos_lookup_col = cur_time * self.pos_vec_lookup.shape[0] // tot_time .
I have also opted to recreate the visualizations with the correct F1-scores using Google Sheets following the F1 calculation issue found by @marcverhagen (I wasn't able to determine the source of the issue). Label B Results
Label C Results
Label I Results
Label S Results
My next plan is to perform gridsearch again with the configuration from #100 (comment) to see if there is any improvement following the change in the script. The F1-scores will be more accurate in the next gridsearch report. |
Regarding the unexpected range of F-1 scores, this is because the result aggregation/plotting script is calculating arithmetic means of P, R, F numbers from all k-fold rounds independently from each other. app-swt-detection/scripts/see_results.py Lines 63 to 82 in 23a8576
|
Gridsearch ResultsThe following are results from running gridsearch using the same hyperparameters as in #100 (comment) following the change in the script. The format is the heatmap created in spreadsheets as before and the values shown are the average F1-scores, all retrieved from visualization outputs from Label B
Label C
Label I
Label S
ConclusionWith these findings, I believe that an ideal configuration for the three hyperparameters is as follows:
Comparing
|
New Feature Summary
Since in the first rounds of training, we used 94 min (the length of the longest video in the training data in those rounds) hard-cap on the sinusoidal positional vectors. However, we now realized that
So before moving on to the next rounds of training (with "hard" examples Owen is currently annotating), we'd like to tweak the positional encoding, and make sure the experiment results we saw in the first rounds (absolute encoding performed the best) are reproducible.
Few ideas of other hybrid positional encoding
Related
No response
Alternatives
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: