DeepRhythm is a convolutional neural network designed for rapid, precise tempo prediction for modern music. It runs on anything that supports Pytorch (I've tested Ubunbu, MacOS, Windows, Raspbian).
Audio is batch-processed using a vectorized Harmonic Constant-Q Modulation (HCQM), drastically reducing computation time by avoiding the usual bottlenecks encountered in feature extraction.
- Split input audio into 8 second clips
[len_batch, len_audio]
- Compute the HCQM of each clip
- Compute STFT
[len_batch, stft_bands, len_audio/hop]
- Sum STFT bins into 8 log-spaced bands using filter matrix
[len_batch, 8, len_audio/hop]
- Flatten bands for parallel CQT processing
[len_batch*8, len_audio/hop]
- For each of the six harmonics, compute the CQT
[6, len_batch*8, num_cqt_bins]
- Reshape
[len_batch, num_cqt_bins, 8, 6]
- Compute STFT
- Feed HCQM through CNN
[len_batch, num_classes (256)]
- Softmax the outputs to get probabilities
- Choose the class with the highest probability and convert to bpm (bpms =
[len_batch]
)
Method | Acc1 (%) | Acc2 (%) | Avg. Time (s) | Total Time (s) |
---|---|---|---|---|
DeepRhythm (cuda) | 95.91 | 96.54 | 0.021 | 20.11 |
DeepRhythm (cpu) | 95.91 | 96.54 | 0.12 | 115.02 |
TempoCNN (cnn) | 84.78 | 97.69 | 1.21 | 1150.43 |
TempoCNN (fcn) | 83.53 | 96.54 | 1.19 | 1131.51 |
Essentia (multifeature) | 87.93 | 97.48 | 2.72 | 2595.64 |
Essentia (percival) | 85.83 | 95.07 | 1.35 | 1289.62 |
Essentia (degara) | 86.46 | 97.17 | 1.38 | 1310.69 |
Librosa | 66.84 | 75.13 | 0.48 | 460.52 |
- Test done on 953 songs, mostly Electronic, Hip Hop, Pop, and Rock
- Acc1 = Prediction within +/- 2% of actual bpm
- Acc2 = Prediction within +/- 2% of actual bpm or a multiple (e.g. 120 ~= 60)
- Timed from filepath in to bpm out (audio loading, feature extraction, model inference)
- I could only get TempoCNN to run on cpu (it requires Cuda 10)
To install DeepRhythm, ensure you have Python and pip installed. Then run:
pip install deeprhythm
python -m deeprhythm.infer /path/to/song.wav -cq
> ([bpm], [confidence])
Flags:
-c
,--conf
- include confidence scores-d
,--device [cuda/cpu/mps]
- specify model device-q
,--quiet
- prints only bpm/conf
To predict the tempo of all songs in a directory, run
python -m deeprhythm.batch_infer /path/to/dir
This will create in a jsonl file mapping filepath to predicted BPM.
Flags:
-o output_path.jsonl
- provide a custom output path (default 'batch_results.jsonl`)-c
,--conf
- include confidence scores-d
,--device [cuda/cpu/mps]
- specify model device-q
,--quiet
- doesn't print status / logs
To predict the tempo of a song:
from deeprhythm import DeepRhythmPredictor
model = DeepRhythmPredictor()
tempo = model.predict('path/to/song.mp3')
# to include confidence
tempo, confidence = model.predict('path/to/song.mp3', include_confidence=True)
print(f"Predicted Tempo: {tempo} BPM")
Audio is loaded with librosa, which supports most audio formats.
If you have already loaded your audio with librosa, for example to carry out pre-processing steps, you can predict the tempo in the following way:
import librosa
from deeprhythm import DeepRhythmPredictor
model = DeepRhythmPredictor()
audio, sr = librosa.load('path/to/song.mp3')
# ... other steps for processing the audio ...
tempo = model.predict_from_audio(audio, sr)
# to include confidence
tempo, confidence = model.predict_from_audio(audio, sr, include_confidence=True)
print(f"Predicted Tempo: {tempo} BPM")
[1] Hadrien Foroughmand and Geoffroy Peeters, “Deep-Rhythm for Global Tempo Estimation in Music”, in Proceedings of the 20th International Society for Music Information Retrieval Conference, Delft, The Netherlands, Nov. 2019, pp. 636–643. doi: 10.5281/zenodo.3527890.
[2] K. W. Cheuk, H. Anderson, K. Agres and D. Herremans, "nnAudio: An on-the-Fly GPU Audio to Spectrogram Conversion Toolbox Using 1D Convolutional Neural Networks," in IEEE Access, vol. 8, pp. 161981-162003, 2020, doi: 10.1109/ACCESS.2020.3019084.