System Design of Speaker Recognition Applied to Biometric Authentication and Computer Voice Control

Introduction
Speech is a rich source of information and a natural form of human interaction. Technologies like Speech-to-Text (STT) and Speaker Identification enhance security and control systems. Given the increasing prevalence of voice recognition systems, this study aims to assist specific disabled users in performing secure and user-friendly computer operations.

Research Methods

Signal Preprocessing: Speech signals are non-stationary, challenging their mathematical modeling. This issue is addressed by framing the signal into 25ms frames to treat it as stationary and convolving these frames with a Hamming window to prevent spectral leakage.
Feature Extraction: The Mel-Frequency Cepstrum (MFC) is used to extract a 40-dimensional feature vector, simulating the human auditory system. This process involves transforming time-domain signals into the frequency domain, followed by using a mel-scale filter bank to capture auditory-relevant features.
Model Training: A Gaussian Mixture Model (GMM) is employed for voice recognition, with initial clustering achieved through K-means classification. The model parameters are refined using the Expectation Maximization (EM) algorithm to enhance the recognition accuracy.

Results and Discussion
The system's effectiveness increases with the number of audio files used for training, with five files found to be optimal. Utilizing more files may lead to parameter expansion and require additional data. Integration of Deep Neural Networks (DNN) could potentially improve recognition but would increase computational demand.

Usage flow

Use Login.py to activate the Python GUI.
By characteristically capturing MFCCs for audio as input to the GMM, users create user profiles in real-time recording.
Access to the computer's voice control interface after voice matching of acoustic fingerprints.
Provides computer-specific operation for physically challenged people through speech-to-text conversion and instant recognition of the speaker.

Credits

BoYong (Leader):
	GUI building  
	main code writing & testing  
	debugging & optimisation  
	posters & slides & demo video

YunWei:
	collecting & sorting data  
	partial code testing  
	assisting with posters & slides

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
PyAutoGUI		PyAutoGUI
audio		audio
development_set		development_set
speaker_models		speaker_models
.gitignore		.gitignore
AI.gif		AI.gif
AI_no.gif		AI_no.gif
Login.py		Login.py
PyAutoGUI.py		PyAutoGUI.py
README.md		README.md
excel.py		excel.py
globals.py		globals.py
instruction.xlsx		instruction.xlsx
pickles.py		pickles.py
record_audio.py		record_audio.py
speakerfeatures.py		speakerfeatures.py
star_square.ico		star_square.ico
test_speaker.py		test_speaker.py
test_speaker_realtime.py		test_speaker_realtime.py
train_models.py		train_models.py
usrs_info.pickle		usrs_info.pickle
voice_control.xlsx		voice_control.xlsx
voice_instruction.py		voice_instruction.py
voice_list_enroll.txt		voice_list_enroll.txt
voice_list_test.txt		voice_list_test.txt
voice_square.ico		voice_square.ico
welcome.gif		welcome.gif

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

System Design of Speaker Recognition Applied to Biometric Authentication and Computer Voice Control

About

Releases

Packages

Languages

boyonglin/NCUEE_capstone-project

Folders and files

Latest commit

History

Repository files navigation

System Design of Speaker Recognition Applied to Biometric Authentication and Computer Voice Control

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages