These directories contain metadata and scripts for processing emotional speech datasets into the format required by the scripts in this repository.
Please see also https://github.com/SuperKogito/SER-datasets for another list of emotional speech datasets.
Currently the following datasets have processing scripts:
- AESDD
- ASED
- BAVED
- CaFE
- CMU-MOSEI
- CREMA-D
- DEMoS
- EESC
- EMO-DB
- EmoFilm
- EmoryNLP
- EmoV-DB
- EMOVO
- ESD
- eNTERFACE
- IEMOCAP
- JL-corpus
- MELD
- MESD
- MESS
- MLEndSND
- MSP-IMPROV
- MSP-PODCAST
- Oréau
- Portuguese
- RAVDESS
- SAVEE
- SEMAINE
- ShEMO
- SmartKom
- SUBESCO
- TESS
- URDU
- VENEC
- VIVAE
Each dataset has a process.py
script that can be used to process the
data into a usable format.
python process.py /path/to/data
For most datasets this involves resampling to 16 kHz 16-bit WAV, and outputting annotations. The scripts for some datasets (e.g. IEMOCAP, MSP-IMPROV) print additional statistics about perceptual ratings.
All scripts will output annotations in CSV format, and any inter-rater agreement metrics that can be calculated (e.g. Krippendorf's alpha, Fleiss' kappa) from ratings data.
A list of file paths to the resampled audio clips will be created for later preprocessing scripts that extract/generate features. Usually all audio clips will be used to generate a dataset file. If you want to only use a subset of audio clips for processing, you can create this list yourself. For example,
find /path/to/audio/subset/ -name "*.wav" | sort > files.txt
Multiple file lists can be created if different subsets want to be tested independently (e.g. testing IEMOCAP's scripted and improvised sessions independently). Some datasets have pre-defined file-lists (e.g. the standard 4-class subset of IEMOCAP).
Each dataset has a corpus.yaml
file which simply lists the
annotations, partitions and file lists, along with a description of the
dataset and a "default" subset to use if one isn't specified. It also
has a key called features_dir
which is used in when combining multiple
datasets with the same features.
An annotation is a mapping from instance name to value, stored as a CSV file with two columns. Labels are the most important annotation, and some datasets can have multiple different labels (e.g. acted labels vs. perceptual labels).
A partition is a categorical annotation which partitions instances into groups (e.g. label, speaker).