Skip to content

vkix-7/Auto-Speech-Recognizer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

System that is capable of Recognizing Voice and Text to speech i.e vice-versa
LIVE AT- https://vkix-7.github.io/Auto-Speech-Recognizer/

Auto-Speech-Recognizer (ASR)

The Auto-Speech-Recognizer (ASR) project focuses on transforming raw audio into a sequence of corresponding words. ASR, also known as Speech-to-Text (STT), plays a crucial role in various speech-related tasks:

  • Speaker Diarization: Determines which speaker spoke when during an audio recording.
  • Speaker Recognition: Identifies and distinguishes different speakers.
  • Spoken Language Understanding: Extracts meaning from spoken language.
  • Sentiment Analysis: Analyzes the emotional tone of the speaker.

Key Components of ASR:

  • Acoustics Variability: Deals with differences in speakers (inter-speaker) and variations within the same speaker (intra-speaker). Factors include noise, reverberation, and environmental conditions.
  • Phonetics and Linguistics: Handles articulation, elisions, and word variations. Considers the size of the vocabulary.

Challenges in ASR:

  • High-Dimensional Output Space: Mapping audio to text involves a complex sequence-to-sequence problem.
  • Limited Annotated Training Data: ASR models require substantial training data, which can be scarce.
  • Noise and Variability: Real-world audio is noisy and contains various sources of variability.

Overall, ASR bridges the gap between spoken language and text, enabling applications like voice assistants, transcription services, and more.


FEEL free to suggest improvement and contribute to this project.

About

System that is capable of Recognizing Voice and voice to speech

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published