Audio model for Speech/Music/SFX discrimination based on VGGish and Modelling of Experts' Decisions on assigning importances on radio objects
This model is provided as supplementary material for the following paper:
@inproceedings{choudakisward2019dafx,
title={MODELLING EXPERTS' DECISIONS ON ASSIGNING NARRATIVE IMPORTANCES OF
OBJECTS IN A RADIO DRAMA MIX},
author={Chourdakis, E. T. and Ward, L. and Paradis, M. and Reiss, J.D. },
year={2019},
booktitle={Int. Conf. Digital Audio Effects},
pages={[Under Review]},
}
The model has been found to be working with the following versions
Module | Version |
---|---|
python3 |
3.6.7 |
tensorflow |
1.12.0 |
keras |
2.2.4 |
There are also several other requirements listed in requirements.txt
Please run the following:
git clone [email protected]:bbc/audio-dafx2019-automatic.git
cd audio-dafx2019-automatic
Download the .h5
and .json
files from v004 to the same directory.
# (optional)
virtualenv venv
source venv/bin/activate
# (/optional)
pip3 install -r requirements.txt
You also need to download vggish_input.py, mel_features.py, and vggish_params.py put it into the same directory.
curl -O https://raw.githubusercontent.com/tensorflow/models/master/research/audioset/vggish_input.py
curl -O https://raw.githubusercontent.com/tensorflow/models/master/research/audioset/mel_features.py
curl -O https://raw.githubusercontent.com/tensorflow/models/master/research/audioset/vggish_params.py
and make sure you read and agree with their LICENSES
head -n 14 vggish_input.py
head -n 14 mel_features.py
head -n 14 vggish_params.py
Suppose you have a bunch of track files you need to classify to either speech/music or sound effects. First, you need to make sure that they are mono tracks 22050Hz/16bit (we recommend using SoX):
sox input.wav -b 16 output.wav rate 22050 remix 1-2
Suppose your files are in a path audio/
, you can classify the files in that folder by running:
python3 classify.py --model music_speech_sfx_discriminator audio/
You will then have a file output.csv
with the results:
cat output.csv
Happy classifying~~!
Suppose you have a folder of stereo stems (sampling rate does not matter) stems/
containing .wav
files. In order to assign an importance level ( 0 -- low importance, 3 -- essential importance) to each of those, download the models seen in the previous section (Installation) to the same folder as assign.py
and then run:
python3 assign.py stems
This will generate a file output.csv
with the list of filenames for each stem, as well features and assigned importances. For more options see
python3 assign.py --help
Please see MusicSpeechSFxDiscrimination.ipynb
Questions or issues about the model should either be raised here or addressed to Emmanouil Theofanis Chourdakis <e.t.chourdakis__Aaa.t!__qmul.ac.uk>