NLP-speech-to-text

Convert speech to text using HuggingFace, comparing Wav2Vec2 versus OpenAI Whisper

Data

Speech samples included a subset of sentences recorded for this study:

Reuter, T., Sullivan, M., & Lew-Williams, C. (2021). Look at that: Spatial deixis reveals experience- related changes in prediction. Language Acquisition. https://doi.org/10.1080/10489223.2021.1932905

Audio for lab-based experiments are very clean. So this should be an easy transcription task.

Conclusion

IMO, Whisper beats Wav2Vec2 in at least 3 ways:

More performant.

Transcribed 20% faster.
Future enhancements could increase speed.

More accurate.

Transcribed "apple" versus "apples" correctly.
Spelled "doggies" correctly as "doggies", not as "DOGGIYS".

More nuanced.

Transcribed 3 sentences with emphatic punctuation (! instead of .)
Punctuation indicates emphasis and emotion, useful for downstream sentiment analysis.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

NLP-speech-to-text

Data

Conclusion

Files

README.md

Latest commit

History

README.md

File metadata and controls

NLP-speech-to-text

Data

Conclusion