Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when using high rates #7

Open
SuperKogito opened this issue May 3, 2020 · 7 comments
Open

Error when using high rates #7

SuperKogito opened this issue May 3, 2020 · 7 comments

Comments

@SuperKogito
Copy link
Owner

error
hi, i am currently working on my second phase of experiment using your source code. thank you :) i just have one doubt: the audio and rate in features vector, im having troubles in sampling_rate when i tried to substitute it with 44100. can you please tell me where im going wrong?

Originally posted by @thxrgxxs in #4 (comment)

@SuperKogito
Copy link
Owner Author

please provide your code too and not just the error logs.
From your code, this line does not seem correct:

features_vector = extract_features(audio=".wav audio", rate=44100)

The audio variable is supposed to be the audio signal from which to compute features

@thxrgxxs
Copy link

thxrgxxs commented May 4, 2020

import numpy as np
from sklearn import preprocessing
from scipy.io.wavfile import read
from python_speech_features import mfcc
from python_speech_features import delta
from GenderIdentifier import GenderIdentifier


def extract_features(audio, rate):
        mfcc_feature = mfcc(# The audio signal from which to compute features.
                            audio,
                            # The samplerate of the signal we are working with.
                            rate,
                            # The length of the analysis window in seconds. 
                            # Default is 0.025s (25 milliseconds)
                            winlen       = 0.05,
                            # The step between successive windows in seconds. 
                            # Default is 0.01s (10 milliseconds)
                            winstep      = 0.01,
                            # The number of cepstrum to return. 
                            # Default 13.
                            numcep       = 13,
                            # The number of filters in the filterbank.
                            # Default is 26.
                            nfilt        = 30,
                            # The FFT size. Default is 512.
                            nfft         = 1024,
                            # If true, the zeroth cepstral coefficient is replaced 
                            # with the log of the total frame energy.
                            appendEnergy = True)
    
        
        mfcc_feature  = preprocessing.scale(mfcc_feature)
        deltas        = delta(mfcc_feature, 2)
        double_deltas = delta(deltas, 2)
        combined      = np.hstack((mfcc_feature, deltas, double_deltas))
        return combined

# init gender identifier
gender_identifier = GenderIdentifier("TestingData/females", 
                                     "TestingData/males", 
                                     "females.gmm", "males.gmm")
# get audio features vector
features_vector = extract_features(audio=".wav audio", rate = 44100)

                                                
# predict/identify speaker's gender
predicted_gender = gender_identifier.identify_gender(features_vector)

the recorded audio files are stored in ".wav audio" file in .wav format. so where do i insert the file path? thank you for you guidance, btw :)

@SuperKogito
Copy link
Owner Author

I don't think this will work. You are training your identifier using data with 16000 Hz sampling rate and using it to test/ recognize gender from a file with a 44100 Hz. Please, take to a look at the graph in the README and this article, that I wrote in order to get a better grasp of the theory. To have correct results you need to use training and testing data with the same sampling rate.

The docs are very clear about this:

audio_path (str) : path to wave file without silent moments.

so:
features_vector = extract_features(audio="INSERT-AUDIO-FILE-PATH-HERE", rate = 44100)

@thxrgxxs
Copy link

thxrgxxs commented May 5, 2020

alright, i'll work on this and let you know how it turns out.

@thxrgxxs
Copy link

thxrgxxs commented May 8, 2020

i converted all my audio files into 16000 and also inserted the file path. but it keeps showing the same error in the "rate" variable.

@SuperKogito
Copy link
Owner Author

if you are using the correct rates and have placed the files in the right folders then running the features extraction, training and testing code sections should be straightforward. Without seeing the code nor the error logs, I can only guess what's wrong. So without the code or the errors, I am afraid that I cannot provide any reliable input.

@thxrgxxs
Copy link

thxrgxxs commented May 28, 2020

error code.zip

this is the wav files i recorded and tried to run using the FeatureExtractor1.py (real time gender classification) code. all the audio files are in 16000Hz. can you please help me out? thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants