Speech Emotion Recognition (SER) refers to the recognition of human emotions from natural speech, vital for building human-centered context-aware intelligent systems. Here, domain shift, where models' trained on one domain exhibit performance degradation when exposed to an unseen domain with different statistics, is a major limiting factor in SER applicability, as models have a strong dependence on speakers and languages characteristics used during training. Meta-Learning for Domain Generalization (MLDG) has shown great success in improving models' generalization capacity and alleviate the domain shift problem in the vision domain; yet, its' efficacy on SER remains largely explored. In this work, we propose a ``Domain-shift Aware'' MLDG approach (DA-MLDG) to learn generalizable models across multiple domains in SER. Based on our extensive evaluation, we identify a number of pitfalls that contribute to poor models' DG ability, and demonstrate that log-mel spectrograms representations lack distinct features required for MLDG in SER. We further explore the use of appropriate features to achieve DG in SER as to provide insides to future research directions for DG in SER.
From the root directory of this repo, run:
foo@bar:~$ python run.py
NOTE: You can adjsut the exexuted methods from with
methods
parameter here.
If you use this repository, please consider citing:
@misc{Tsou2303:Efficacy, author={"Raeshak {King Gandhi} and Vasileios Tsouvalas and Nirvana Meratnia"}, title={"On efficacy of {Meta-Learning} for Domain Generalization in Speech Emotion Recognition"}, booktitle="Second International Workshop on Negative Results in Pervasive Computing 2023 (PerFail 2023)", address={"Atlanta, USA"}, month={mar}, year={2023}, }