-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add the data augmentation method mixup #47
Conversation
Judyxujj
commented
Apr 18, 2024
•
edited
Loading
edited
- This implementation of mixup only mix up the audio inputs x, but not on the targets y.
- The implementation is partly based on Albert's implementation in returnn front end (c.f. [https://github.com/rwth-i6/i6_experiments/blob/main/users/zeyer/returnn/models/rf_mixup.py]) and Zoltan's implementation at Apptek.
This comment was marked as resolved.
This comment was marked as resolved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this implementation
- for each sequence a random number
n
between 1 andmax_num_mix
(e.g. 4) is drawn - 4 different audio tracks are selected from the buffered data
- each sequence is perturbed with track 1
- the sequences with
n<=2
are perturbed with track 1 and 2 - the sequences with
n<=3
are perturbed with track 1, 2, and 3 (and so on)
is there any reason why to make it so complicated?
What I would have done:
- for each sequence a random number
n
between 1 andmax_num_mix
(e.g. 4) is drawn \sum_b n_b
different audio tracks are selected from the buffered data (this is M' from the inline comments)- each sequence is perturbed with n separate tracks
Bonus implementation: with probability 1-apply_prob
n_b
is set to 0 i.e. no mixup for a specific sequence.
i6_models/primitives/mixup.py
Outdated
lambda_min: minimum lambda value | ||
lambda_max: maximum lambda value | ||
max_num_mix: maximum number of mixups (random int in [1, max_num_mix]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know that lambda_min
is the minimum "lambda".. the docstring should tell me what exactly "lambda" is and what it is doing (e.g. SNR? or length of the perturbed audio in seconds? ...)
what is "one mixup"? if I put it to 10, will there be 10 different audio tracks mixed into each target or will only 10 out of the million sentences ever be affected by mixup?
Co-authored-by: Albert Zeyer <[email protected]>
Co-authored-by: Albert Zeyer <[email protected]>
Co-authored-by: Albert Zeyer <[email protected]>
Co-authored-by: Albert Zeyer <[email protected]>
Co-authored-by: Albert Zeyer <[email protected]>
Co-authored-by: Albert Zeyer <[email protected]>
Co-authored-by: Albert Zeyer <[email protected]>
Co-authored-by: Albert Zeyer <[email protected]>
Co-authored-by: Albert Zeyer <[email protected]>
This comment was marked as resolved.
This comment was marked as resolved.
I think there is some misunderstanding here. In current implementation, for each sequence, |
My suggestion was to sample E.g. if there is a batch of 3 sequences and the So am I wrong in my understanding of the current implementation or is my suggestion unclear?
we can/should keep correct normalization of mixup scales, I agree to that. |
okay, now I understand what you meant. But why it is simpler than the current one? We still need to do scales normalisation (If I understand it correctly, your 'perturbed' is actually scale normalisation, or?). |
haha, it is not simpler. More complicated actually. But I thought it might be helpful for the training to have different noises for different sequences. |
Using linear combination of same random sequences with different mix up scales should also give different noises. But I agree that it might be more robust to use different noises sequences. Albert and Zoltan both did mixup in this way. @albertz do you think it would help? |
But we already have different noises for different sequences? |
then I must have misread the code and am happy now. |
From a quick glance, this looks ok now. Just to verify: This is following now exactly the same algorithm as in my RF implementation, or are there some differences? (I would prefer if we do exactly the same as a starting point, and then we can discuss, maybe after some experiments, whether we want to change/extend sth.) |