This is the code repository for the OMG emotion challenge 2018.
Arxiv paper: Multimodal Utterance-level Affect Analysis using Visual, Audio and Text Features
To run this code, you need to install these libraries first
- Keras 2.+
- keras-vggface 0.5
- tensorflow 1.5+
- OpenFace 1.0.0
- openSMILE feature extraction tool
- nltk 3.+
- pickle, numpy, sklearn, matplotlib, csv, matlab
In data prepration, all videos will be downloaded, and splitted into utterances into /Videos/Train, /Videos/Validation,/Video/Test (the csv files for train, validation, test set can be requested from OMG emotion challenge)
- data_preparation: run
python create_videoset.py
In feature extraction, the features for three modal are extracted
- feature_extraction:
- run
python OpenFace_extractor
: OpenFace features are extracted - run
python generate_visual_features.py
: VGG Face fc6 features are extracted. - run
extract_audio_files.py
: audio files are extracted from video format files. - run
generate_audio_feature_utterance_level.py
: openSMILE features are extracted. - run
generate_word_features.py
: text features from Bing Liu's opinion lexicon are extracted. - run
/MPQA/count_pn.m
and/MPQA/parse_MPQA_feature.py
: text features from MPQA Subjectivity Lexicon are extracted.
- run
In experiment:
data.py
provides normalized features and labels.models.py
contains definitions of unimodal models, trimodal models in late and early fusion.functions.py
defines some custom functions used as loss function or metric.train.py
: train and evaluation.