Hosted on Kaggle LB 0.537
- TensorFlow
- MobileNetV1 type model
- Online mel-spectrogram conversion (using GPU)
- label smoothening
- 96 mels, 0.5 sec windows
- pretrain on noisy set, warmstart and train on curated set
- Fast, 2 minute inference on GPU
- ! no data augmentation
LB 0.611
- PyTorch
- Basic 2D CNN model.
- LeakyReLU/PReLU in output
- Preprocessed log-melspec
- 128 mels, 2 sec windows, 128 frames (2*347 samples hop at 44.1kHz)
- ! Test Time Augmentation (TTA). 5x
- ?? Horizontal flip data augmentation
- Mean-subtract. Minmax scaling
- time-stretch
- frequency-shift
- Mixup/between-class
- Cutout/random-erase
- Noise addition
Showing that data augmentation on log mel-spectrograms (not audio waveform) performs well
Used TensorFlow sparse_image_warp
- Test-time-Augmentation
- GBT over frame-wise embeddings? GlobalMean might not be the best