Recommersion

Recommersion is a Context-Aware Recommender System (CARS) designed to suggest songs from DEAM, PMEmo and SpotiGeM datasets. These recommendations are based on emotional dimensional values—valence and arousal—captured through speech or manually adjusted via sliders. The system implements Speech Emotion Recognition (SER) models, including the Model for Dimensional Speech Emotion Recognition based on Wav2vec 2.0 by Audeering and a custom built model.

Custom SER model

The custom SER model integrates two parallel processing approaches trained on a combination of IEMOCAP (widely-used dataset for multimodal emotion recognition), MuSe-CAR (dataset designed for emotion recognition in car driving scenarios, offering diverse emotional expressions and capturing different context circumstances) and MSP-Podcast (corpus that has 151,654 speaking turns, which makes this dataset the largest naturalistic speech emotional one in the community. Due to its sensible dimensionality, a sample of 30% has been utilized):

A Convolutional Neural Network (CNN) applied to Mel spectrograms.
Fine-tuning a pre-trained Wav2vec2.0 transformer layers while freezing lower-level layers.

The combined features from these parallel approaches are processed using a Bidirectional Long Short-Term Memory (BLSTM) architecture for capturing temporal dependencies. A final regression layer predicts emotional dimensions (valence and arousal), enabling song recommendations through either Euclidean or Cosine similarity.

Architecture

Overview

Recommersion is a vocal assistant capable of contextualizing a specific situation based on user-provided inputs influenced by their emotional state. Its features are seamlessly integrated into a GUI (shown below), where users can access instructions by hovering over the interactive widgets. The system processes the user's input through a sophisticated Recommender System, which uses a Speech Emotion Recognition (SER) model and a given similarity (Euclidean or Cosine) to suggest an optimal number of tracks tailored to the given context. Once the recommendation is generated and a musical fragment begins playing, users gain access to a proper interface for valence and arousal and song's reproduction real-time adjustments.

For example, an user might say: "This song feels too happy" and control the valence parameter to increase the sadness level. They can also select any track from the generated playlist. After making their desired changes, a "Recompute Playlist" button allows users to submit updated emotional parameters, prompting the system to recalculate and refine the recommendations accordingly.

For more information, check the detailled report

GUI

Setup

Clone this repository.
Create a virtual python 3.10 environment.
Set the python packet manager to version 23.3.1, using:
```
$ pip upgrade --install pip==23.3.1
```
Install the imported libraries using:
```
$ pip install -r requirements.txt
```
Run
```
$ python recommersion.py
```

Datasets

The datasets should be downloaded upon academic requested from:

Training:
Songs:
- DEAM
- PMEmo

License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 124 Commits
data_management		data_management
hover_interface		hover_interface
plots		plots
survey		survey
vocal_assistant		vocal_assistant
.gitignore		.gitignore
APR_and_Sound_in_Interaction_Project-Report.pdf		APR_and_Sound_in_Interaction_Project-Report.pdf
LICENSE		LICENSE
README.md		README.md
recommersion.py		recommersion.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Recommersion

Custom SER model

Architecture

Overview

GUI

Setup

Datasets

License

About

Languages

License

lukebella/Recommersion

Folders and files

Latest commit

History

Repository files navigation

Recommersion

Custom SER model

Architecture

Overview

GUI

Setup

Datasets

License

About

Topics

Resources

License

Stars

Watchers

Forks

Languages