Emotional speech datasets

These directories contain metadata and scripts for processing emotional speech datasets into the format required by the scripts in this repository.

Please see also https://github.com/SuperKogito/SER-datasets for another list of emotional speech datasets.

Currently the following datasets have processing scripts:

AESDD
ASED
BAVED
CaFE
CMU-MOSEI
CREMA-D
DEMoS
EESC
EMO-DB
EmoFilm
EmoryNLP
EmoV-DB
EMOVO
ESD
eNTERFACE
IEMOCAP
JL-corpus
MELD
MESD
MESS
MLEndSND
MSP-IMPROV
MSP-PODCAST
Oréau
Portuguese
RAVDESS
SAVEE
SEMAINE
ShEMO
SmartKom
SUBESCO
TESS
URDU
VENEC
VIVAE

Processing

Each dataset has a process.py script that can be used to process the data into a usable format.

python process.py /path/to/data

For most datasets this involves resampling to 16 kHz 16-bit WAV, and outputting annotations. The scripts for some datasets (e.g. IEMOCAP, MSP-IMPROV) print additional statistics about perceptual ratings.

All scripts will output annotations in CSV format, and any inter-rater agreement metrics that can be calculated (e.g. Krippendorf's alpha, Fleiss' kappa) from ratings data.

File lists and subsets

A list of file paths to the resampled audio clips will be created for later preprocessing scripts that extract/generate features. Usually all audio clips will be used to generate a dataset file. If you want to only use a subset of audio clips for processing, you can create this list yourself. For example,

find /path/to/audio/subset/ -name "*.wav" | sort > files.txt

Multiple file lists can be created if different subsets want to be tested independently (e.g. testing IEMOCAP's scripted and improvised sessions independently). Some datasets have pre-defined file-lists (e.g. the standard 4-class subset of IEMOCAP).

corpus.yaml

Each dataset has a corpus.yaml file which simply lists the annotations, partitions and file lists, along with a description of the dataset and a "default" subset to use if one isn't specified. It also has a key called features_dir which is used in when combining multiple datasets with the same features.

Annotations

An annotation is a mapping from instance name to value, stored as a CSV file with two columns. Labels are the most important annotation, and some datasets can have multiple different labels (e.g. acted labels vs. perceptual labels).

Partitions

A partition is a categorical annotation which partitions instances into groups (e.g. label, speaker).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Emotional speech datasets

Processing

File lists and subsets

corpus.yaml

Annotations

Partitions

Files

README.md

Latest commit

History

README.md

File metadata and controls

Emotional speech datasets

Processing

File lists and subsets

corpus.yaml

Annotations

Partitions