diff --git a/README.md b/README.md index 2d4806b..6286441 100644 --- a/README.md +++ b/README.md @@ -1,4 +1,4 @@ -***Spoken Emotion Recognition Datasets:*** *A collection of datasets (count=44) for the purpose of emotion recognition/detection in speech. +***Spoken Emotion Recognition Datasets:*** *A collection of datasets (count=49) for the purpose of emotion recognition/detection in speech. The table is chronologically ordered and includes a description of the content of each dataset along with the emotions included. The table can be browsed, sorted and searched under https://superkogito.github.io/SER-datasets/* | Dataset | Year | Content | Emotions | Format | Size | Language | Paper | Access | License | @@ -6,11 +6,16 @@ The table can be browsed, sorted and searched under https://superkogito.github.i | [Quechua-SER](https://figshare.com/articles/media/Quechua_Collao_for_Speech_Emotion_Recognition/20292516) | 2022 | 12420 audio recordings (~15 hours) and their transcriptions by 7 native speakers. | Emotional labels using dimensions: valence, arousal, and dominance. | Audio | 3.53 GB | Quechua Collao | [A speech corpus of Quechua Collao for automatic dimensional emotion recognition](https://www.nature.com/articles/s41597-022-01855-9) | Open | [CC BY 4.0](https://creativecommons.org/licenses/by/4.0/) | | [MESD](https://data.mendeley.com/datasets/cy34mh68j9/5) | 2022 | 864 audio files of single-word emotional utterances with Mexican cultural shaping. | 6 emotions provides single-word utterances for anger, disgust, fear, happiness, neutral, and sadness. | Audio | 0,097 GB | Spanish (Mexican) | [The Mexican Emotional Speech Database (MESD): elaboration and assessment based on machine learning](https://pubmed.ncbi.nlm.nih.gov/34891601/) | Open | [CC BY 4.0](https://creativecommons.org/licenses/by/4.0/) | | [SyntAct](https://zenodo.org/record/6573016#.ZAjy_9LMJpj) | 2022 | Synthesized database of three basic emotions and neutral expression based on rule-based manipulation for a diphone synthesizer which we release to the public | 997 utterances including 6 emotions: angry, bored, happy, neutral, sad and scared | Audio | 941 MB | German | [SyntAct: A Synthesized Database of Basic Emotions](http://felix.syntheticspeech.de/publications/synthetic_database.pdf) | Open | [CC BY-SA 4.0](https://creativecommons.org/licenses/by/4.0) | +| [LSSED](https://github.com/tobefans/LSSED) | 2021 | Large Scale Spanish Emotional Speech Database | 8 emotions provides Spanish spoken utterances for anger, boredom, disgust, fear, happiness, neutral, sadness, and surprise. | Audio | 90 GB | Spanish (Castilian) | [LSSED: A Large-Scale Spanish Emotional Speech Database for Speech Processing and Machine Learning](https://www.mdpi.com/1424-8220/21/23/6985) | Open | [CC BY-SA 4.0](https://creativecommons.org/licenses/by-sa/4.0/) | | [MLEnd](https://www.kaggle.com/datasets/jesusrequena/mlend-spoken-numerals) | 2021 | ~32700 audio recordings files produced by 154 speakers. Each audio recording corresponds to one English numeral (from "zero" to "billion") | Intonations: neutral, bored, excited and question | Audio | 2.27 GB | -- | -- | Open | Unknown | | [ASVP-ESD](https://www.kaggle.com/datasets/dejolilandry/asvpesdspeech-nonspeech-emotional-utterances) | 2021 | ~13285 audio files collected from movies, tv shows and youtube containing speech and non-speech. | 12 different natural emotions (boredom, neutral, happiness, sadness, anger, fear, surprise, disgust, excitement, pleasure, pain, disappointment) with 2 levels of intensity. | Audio | 2 GB | Chinese, English, French, Russian and others | -- | Open | Unknown | | [ESD](https://hltsingapore.github.io/ESD/) | 2021 | 29 hours, 3500 sentences, by 10 native English speakers and 10 native Chinese speakers. | 5 emotions: angry, happy, neutral, sad, and surprise. | Audio, Text | 2.4 GB (zip) | Chinese, English | [Seen And Unseen Emotional Style Transfer For Voice Conversion With A New Emotional Speech Dataset](https://arxiv.org/pdf/2010.14794.pdf) | Open | Academic License | | [MuSe-CAR](https://zenodo.org/record/4134758) | 2021 | 40 hours, 6,000+ recordings of 25,000+ sentences by 70+ English speakers (see db link for details). | continuous emotion dimensions characterized using valence, arousal, and trustworthiness. | Audio, Video, Text | 15 GB | English | [The Multimodal Sentiment Analysis in Car Reviews (MuSe-CaR) Dataset: Collection, Insights and Improvements](https://arxiv.org/pdf/2101.06053.pdf) | Restricted | Academic License & Commercial License | +| [THAI SER](https://github.com/vistec-AI/dataset-releases/releases/tag/v1) | 2021 | The recordings are 41 hours, 36 minutes long (27,854 utterances), and were performed by 200 professional actors (112 female, 88 male). | 5 main emotions assigned to actors: Neutral, Anger, Happiness, Sadness, and Frustration. | Audio | 12 GB | Thai | -- | Open | [CC BY-SA 4.0](https://creativecommons.org/licenses/by-sa/4.0) | +| [French Emotional Speech Database - Oréau](https://zenodo.org/records/4405783#.Yqjq_9JBxph) | 2020 | 79 utterances with 10 to 13 utterances pro emotion by 32 non-professional speakers. | 7 emotions: sadness, anger, disgust, fear, surprise, joy, neutral. | Audio | 0.264 GB | French | -- | Open | [CC BY 4.0](https://creativecommons.org/licenses/by/4.0/) | +| [Att-HACK ](http://www.openslr.org/88/) | 2020 | 25 speakers interpreting 100 utterances in 4 social attitudes, with 3-5 repetitions each per attitude for a total of around 30 hours of speech. | expressive speech in French, 100 phrases with multiple versions (3 to 5) in four social attitudes (friendly, distant, dominant and seductive). | Audio | 6.6 GB | French | [Att-HACK: An Expressive Speech Database with Social Attitudes](https://arxiv.org/abs/2004.04410) | Open | [CC BY-NC-ND 4.0](https://creativecommons.org/licenses/by-nc-nd/4.0/) | | [MSP-Podcast corpus](https://ecs.utdallas.edu/research/researchlabs/msp-lab/MSP-Podcast.html) | 2020 | 100 hours by over 100 speakers (see db link for details). | This corpus is annotated with emotional labels using attribute-based descriptors (activation, dominance and valence) and categorical labels (anger, happiness, sadness, disgust, surprised, fear, contempt, neutral and other). | Audio | -- | -- | [The MSP-Conversation Corpus](http://www.interspeech2020.org/index.php?m=content&c=index&a=show&catid=290&id=684) | Restricted | Academic License & Commercial License | +| [BEASC](https://doi.org/10.6084/m9.figshare.12498033) | 2020 | Bangla Emotional Audio-Speech Corpus | 6 emotions provides Bangla spoken utterances for anger, happiness, sadness, fear, surprise, and neutral. | Audio | 9 GB | Bangla | [BEASC: Bangla Emotional Audio-Speech Corpus - A Speech Emotion Recognition Corpus for the Low-Resource Bangla Language](https://www.mdpi.com/2076-3417/10/11/3704) | Open | [CC BY 4.0](https://creativecommons.org/licenses/by/4.0/) | | [emotiontts open db](https://github.com/emotiontts/emotiontts_open_db) | 2020 | Recordings and their associated transcriptions by a diverse group of speakers. | 4 emotions: general, joy, anger, and sadness. | Audio, Text | -- | Korean | -- | Partially open | [CC BY-NC-SA 4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/) | | [URDU-Dataset](https://github.com/siddiquelatif/urdu-dataset) | 2020 | 400 utterances by 38 speakers (27 male and 11 female). | 4 emotions: angry, happy, neutral, and sad. | Audio | 0.072 GB | Urdu | [Cross Lingual Speech Emotion Recognition: Urdu vs. Western Languages](https://arxiv.org/pdf/1812.10411.pdf) | Open | -- | | [BAVED](https://www.kaggle.com/a13x10/basic-arabic-vocal-emotions-dataset) | 2020 | 1935 recording by 61 speakers (45 male and 16 female). | 3 levels of emotion. | Audio | 0.195 GB | Arabic | -- | Open | -- | diff --git a/src/ser-datasets.csv b/src/ser-datasets.csv index d0c0162..9c2d332 100644 --- a/src/ser-datasets.csv +++ b/src/ser-datasets.csv @@ -2,11 +2,16 @@ Dataset,Year,Content,Emotions,Format,Size,Language,Paper,Access,License `Quechua-SER `_,2022,12420 audio recordings (~15 hours) and their transcriptions by 7 native speakers.,"Emotional labels using dimensions: valence, arousal, and dominance.",Audio,3.53 GB,Quechua Collao,`A speech corpus of Quechua Collao for automatic dimensional emotion recognition `_,Open,`CC BY 4.0 `_ `MESD `_,2022,864 audio files of single-word emotional utterances with Mexican cultural shaping.,"6 emotions provides single-word utterances for anger, disgust, fear, happiness, neutral, and sadness.",Audio,"0,097 GB",Spanish (Mexican),`The Mexican Emotional Speech Database (MESD): elaboration and assessment based on machine learning `_,Open,`CC BY 4.0 `_ `SyntAct `_,2022,Synthesized database of three basic emotions and neutral expression based on rule-based manipulation for a diphone synthesizer which we release to the public ,"997 utterances including 6 emotions: angry, bored, happy, neutral, sad and scared",Audio,941 MB,German,`SyntAct: A Synthesized Database of Basic Emotions `_,Open,`CC BY-SA 4.0 `_ +`LSSED `_,2021,Large Scale Spanish Emotional Speech Database,"8 emotions provides Spanish spoken utterances for anger, boredom, disgust, fear, happiness, neutral, sadness, and surprise.",Audio,90 GB,Spanish (Castilian),`LSSED: A Large-Scale Spanish Emotional Speech Database for Speech Processing and Machine Learning `_,Open,`CC BY-SA 4.0 `_ `MLEnd `_,2021,"~32700 audio recordings files produced by 154 speakers. Each audio recording corresponds to one English numeral (from ""zero"" to ""billion"")","Intonations: neutral, bored, excited and question",Audio,2.27 GB,--,--,Open,Unknown `ASVP-ESD `_,2021,"~13285 audio files collected from movies, tv shows and youtube containing speech and non-speech.","12 different natural emotions (boredom, neutral, happiness, sadness, anger, fear, surprise, disgust, excitement, pleasure, pain, disappointment) with 2 levels of intensity.",Audio,2 GB,"Chinese, English, French, Russian and others",--,Open,Unknown `ESD `_,2021,"29 hours, 3500 sentences, by 10 native English speakers and 10 native Chinese speakers.","5 emotions: angry, happy, neutral, sad, and surprise.","Audio, Text",2.4 GB (zip),"Chinese, English",`Seen And Unseen Emotional Style Transfer For Voice Conversion With A New Emotional Speech Dataset `_,Open,Academic License `MuSe-CAR `_,2021,"40 hours, 6,000+ recordings of 25,000+ sentences by 70+ English speakers (see db link for details).","continuous emotion dimensions characterized using valence, arousal, and trustworthiness.","Audio, Video, Text",15 GB,English,"`The Multimodal Sentiment Analysis in Car Reviews (MuSe-CaR) Dataset: Collection, Insights and Improvements `_",Restricted,Academic License & Commercial License +`THAI SER `_,2021,"The recordings are 41 hours, 36 minutes long (27,854 utterances), and were performed by 200 professional actors (112 female, 88 male).","5 main emotions assigned to actors: Neutral, Anger, Happiness, Sadness, and Frustration.",Audio,12 GB,Thai,--,Open,`CC BY-SA 4.0 `_ +`French Emotional Speech Database - Oréau `_,2020,79 utterances with 10 to 13 utterances pro emotion by 32 non-professional speakers.,"7 emotions: sadness, anger, disgust, fear, surprise, joy, neutral.",Audio,0.264 GB,French,--,Open,`CC BY 4.0 `_ +`Att-HACK `_,2020,"25 speakers interpreting 100 utterances in 4 social attitudes, with 3-5 repetitions each per attitude for a total of around 30 hours of speech.","expressive speech in French, 100 phrases with multiple versions (3 to 5) in four social attitudes (friendly, distant, dominant and seductive).",Audio,6.6 GB,French,`Att-HACK: An Expressive Speech Database with Social Attitudes `_,Open,`CC BY-NC-ND 4.0 `_ `MSP-Podcast corpus `_,2020,100 hours by over 100 speakers (see db link for details).,"This corpus is annotated with emotional labels using attribute-based descriptors (activation, dominance and valence) and categorical labels (anger, happiness, sadness, disgust, surprised, fear, contempt, neutral and other).",Audio,--,--,`The MSP-Conversation Corpus `_,Restricted,Academic License & Commercial License +`BEASC `_,2020,Bangla Emotional Audio-Speech Corpus,"6 emotions provides Bangla spoken utterances for anger, happiness, sadness, fear, surprise, and neutral.",Audio,9 GB,Bangla,`BEASC: Bangla Emotional Audio-Speech Corpus - A Speech Emotion Recognition Corpus for the Low-Resource Bangla Language `_,Open,`CC BY 4.0 `_ `emotiontts open db `_,2020,Recordings and their associated transcriptions by a diverse group of speakers.,"4 emotions: general, joy, anger, and sadness.","Audio, Text",--,Korean,--,Partially open,`CC BY-NC-SA 4.0 `_ `URDU-Dataset `_,2020,400 utterances by 38 speakers (27 male and 11 female).,"4 emotions: angry, happy, neutral, and sad.",Audio,0.072 GB,Urdu,`Cross Lingual Speech Emotion Recognition: Urdu vs. Western Languages `_,Open,-- `BAVED `_,2020,1935 recording by 61 speakers (45 male and 16 female).,3 levels of emotion.,Audio,0.195 GB,Arabic,--,Open,--