LRS dataset processing

This repository is for espnet/egs/lrs/avsr1 local folder. Detailed information please see espnet lrs/avsr1 repository

data_prepare: prepare lrs2 and lrs3 dataset for audio-visual speech recognition
dump: re-creat dump files for audio-visual speech recognition and split dump files according to their signal-to-noise ratio (SNR)
extract_reliability: do audio and visual data augmentation. Extract the reliability measures based on the paper "Fusing information streams in end-to-end audio-visual speech recognition." (https://ieeexplore.ieee.org/document/9414553) [1]
training: the files for audio-visual speech recognition. ESPnet is designed for audio-only speech recognition. Therefore, we changed some codes to fit the audio-visual task. In the training process, we create a link to the ESPnet environment and train the model. After training, these links will be released. The ESPnet code will not be changed.

Literature

[1] W. Yu, S. Zeiler and D. Kolossa, "Fusing Information Streams in End-to-End Audio-Visual Speech Recognition," ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2021, pp. 3430-3434, doi: 10.1109/ICASSP39728.2021.9414553.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
data_prepare		data_prepare
dump		dump
extract_reliability		extract_reliability
training		training
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LRS dataset processing

Literature

About

Releases

Packages

Languages

License

rub-ksv/lrs_avsr1_local

Folders and files

Latest commit

History

Repository files navigation

LRS dataset processing

Literature

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages