emotion recognition webdata #102

jiachengc · 2024-09-12T01:20:36Z

Thanks for this solid work. Have you released any preprocessed emotion recognition web dataset like Ravdess, Cream-D, or any data processing files so we can process the data ourselves? @knoriy @YuchenHui22314

YuchenHui22314 · 2024-09-12T02:12:43Z

Thanks for you comment, but as far as I know, we did not use emotion recognition datasets in the end. Good luck with you research!

jiachengc · 2024-10-17T03:06:12Z

Thanks for you comment, but as far as I know, we did not use emotion recognition datasets in the end. Good luck with you research!

Thank you for your quick reply. I would like to ask a quick question: if my dataset is an audio emotion recognition dataset, such as TESS, when I process the corresponding webdata, should I rewrite the 'text' column to represent the emotion label corresponding to the audio instead of caption of audio? For example,
{ "text": [ "happy" ], "tag": [ "happy" ], "original_data": { "title": "TESS - Toronto Emotional Speech Set", "desciption": "Dataset for emotion recognition from audio", "license": "TESS dataset license", "fname": "OAF_back_happy.flac", "category": "happy" } }
doing so in order to allow the model to output the corresponding emotion predictions. Looking forward to see your reply. Thanks in advance!. @YuchenHui22314

YuchenHui22314 · 2024-10-17T13:51:33Z

Then the "text" should be a complete sentence, e.g. ["this is an happy sound"]. So you may want to come up with a way to make up a sentence using emotion labels.

jiachengc · 2024-10-18T00:56:57Z

Then the "text" should be a complete sentence, e.g. ["this is an happy sound"]. So you may want to come up with a way to make up a sentence using emotion labels.

Thank you for your quick reply, I really appreciate it. I followed your suggestion and modified the text to captions like ['this is a happy sound'], and then used eval_linear_probe.py to fine-tune the last linear layer of audio encoder. However, the results are quite poor on iemocap dataset, with an accuracy of around 55%. So far, I've tried a range of learning rates: [1e-2, 1e-3, 1e-4, 1e-5], weight decay values: [0.1, 0.01, 0.001, 0.001], and linear probe losses: [ce, mse]. However, none of these combinations have achieved accuracy beyond 55%. I'm feeling a bit lost about the next debugging direction and would greatly appreciate any suggestions you might have. Thanks in advance again! @YuchenHui22314

YuchenHui22314 · 2024-10-19T01:15:07Z

Oh ok. So you are doing supervised classification instead of contrastive pretraining. I thought that you would like to add emotion dataset as part of pretraining data. "text" should be a sentence only in the pretraining process, but when it comes to supervised classification, I am not familir with the eval_linear_probe.py code. Maybe you could reach out to Ke Chen on this!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

emotion recognition webdata #102

emotion recognition webdata #102

jiachengc commented Sep 12, 2024

YuchenHui22314 commented Sep 12, 2024

jiachengc commented Oct 17, 2024

YuchenHui22314 commented Oct 17, 2024

jiachengc commented Oct 18, 2024

YuchenHui22314 commented Oct 19, 2024

emotion recognition webdata #102

emotion recognition webdata #102

Comments

jiachengc commented Sep 12, 2024

YuchenHui22314 commented Sep 12, 2024

jiachengc commented Oct 17, 2024

YuchenHui22314 commented Oct 17, 2024

jiachengc commented Oct 18, 2024

YuchenHui22314 commented Oct 19, 2024