Pitch_Determiation_for_Speech_Signal

It is a Pitch Determination Algorithm based on Short-time Autocorrelation and Shortest-distance Search

1. Installation

git clone https://github.com/MorrisXu-Driving/Pitch_Determiation_for_Speech_Signal.git
Create a new project in Python IDE and choose file mainvoid.py as the script path in configuration.
Make sure the test input wav file tone4_w.wav is under the same directory as the mainvoid.py.

2. Algorithm Structure

a. Overall Flow
b. Preprocessing
c. Candidate Generation
d. Postprocessing

3. Parameter Setting

In this algorithm we have:

Parameters for input preprocessing
- wlen = int(0.03 * fs) # 0.03 stands for wlen in time domain, here the wlen is 30ms.
- inc = int(0.01 * fs) # 0.01 stands for inc in time domain, here the inc is 10ms.
- lf = 60 # Hz # lf stands for the lower pass frequency of the bandpass denoising filter
- hf = 500 # Hz # hf stands for the upper pass frequency of the bandpass denoising filter
Parameters for pitch determination
- IS = 0.8 # Observe the waveform of the input audio at above diagram and set non-speech time at the start of the input in second
- r1= 0.03 # Threshold Coefficient for energy threshold T1 (shown in the above diagram) judging speech segment, namely T1 = np.mean(H[:NIS]) * r1 where H[:NIS] is the energy of speech between 0-IS.
- r2 = 0.26 # Threshold Coefficient for judging mainbodys in a speech segment, each speech segment has a different T2 (shown in the above diagram)
- ThrC = [10, 15] # Max difference in F0 between adjacent frames when conducting the shortest-distance search in order to avoid unnatural change in final result
- miniL = 10 # Minimum length for a speech segment
- mnlong = 3 # Minimum length for a major body in speech segments

4. Result Demo

The above diagram consists of the spectrogram of the input audio and the pitch extracted from the input file. The pitch extracted(in white line) highly correlated with the first harmonic frequency shwon from the STFT spectrogram, which reveals that the algorithm is working properly.
The RMSE in Hz of the results tested from the wav files in speech_signal_for_test/.

5. Conclusion

The algorithm is not adaptive to differnt types of audio signals.
- For those inputs with low SNR(i.e. the background energy between 0-IS is very high already needs to set a low r1)
- For those inputs with low energy at each speech segments, r2 should be lower in order to better recognize the extended parts besides each mainbodys.
- Adaptive parameter setting is needed to have better user experience since too many parameters need to be adjusted to achieve a good performance on different types of speech audios.
Future Work
- Merely extracting the pitch is not friendly for future research. Its combination with forced alignment in char level and word level need to be conducted.

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
speech_signal_for_test		speech_signal_for_test
venv/readme_img		venv/readme_img
A2_0.wav		A2_0.wav
Pitch_Extraction_Function.py		Pitch_Extraction_Function.py
README.md		README.md
mainvoid.py		mainvoid.py
tone4_m.wav		tone4_m.wav
tone4_w.wav		tone4_w.wav

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Pitch_Determiation_for_Speech_Signal

1. Installation

2. Algorithm Structure

a. Overall Flow

b. Preprocessing

c. Candidate Generation

d. Postprocessing

3. Parameter Setting

4. Result Demo

5. Conclusion

About

Releases

Packages

Languages

MorrisXu-Driving/Pitch_Determiation_and-Endpoint_Detection_for_Speech_Signal

Folders and files

Latest commit

History

Repository files navigation

Pitch_Determiation_for_Speech_Signal

1. Installation

2. Algorithm Structure

a. Overall Flow

b. Preprocessing

c. Candidate Generation

d. Postprocessing

3. Parameter Setting

4. Result Demo

5. Conclusion

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages