Skip to content

Latest commit

 

History

History
90 lines (81 loc) · 4.21 KB

README.md

File metadata and controls

90 lines (81 loc) · 4.21 KB

Emotional speech datasets

These directories contain metadata and scripts for processing emotional speech datasets into the format required by the scripts in this repository.

Please see also https://github.com/SuperKogito/SER-datasets for another list of emotional speech datasets.

Currently the following datasets have processing scripts:

Processing

Each dataset has a process.py script that can be used to process the data into a usable format.

python process.py /path/to/data

For most datasets this involves resampling to 16 kHz 16-bit WAV, and outputting annotations. The scripts for some datasets (e.g. IEMOCAP, MSP-IMPROV) print additional statistics about perceptual ratings.

All scripts will output annotations in CSV format, and any inter-rater agreement metrics that can be calculated (e.g. Krippendorf's alpha, Fleiss' kappa) from ratings data.

File lists and subsets

A list of file paths to the resampled audio clips will be created for later preprocessing scripts that extract/generate features. Usually all audio clips will be used to generate a dataset file. If you want to only use a subset of audio clips for processing, you can create this list yourself. For example,

find /path/to/audio/subset/ -name "*.wav" | sort > files.txt

Multiple file lists can be created if different subsets want to be tested independently (e.g. testing IEMOCAP's scripted and improvised sessions independently). Some datasets have pre-defined file-lists (e.g. the standard 4-class subset of IEMOCAP).

corpus.yaml

Each dataset has a corpus.yaml file which simply lists the annotations, partitions and file lists, along with a description of the dataset and a "default" subset to use if one isn't specified. It also has a key called features_dir which is used in when combining multiple datasets with the same features.

Annotations

An annotation is a mapping from instance name to value, stored as a CSV file with two columns. Labels are the most important annotation, and some datasets can have multiple different labels (e.g. acted labels vs. perceptual labels).

Partitions

A partition is a categorical annotation which partitions instances into groups (e.g. label, speaker).