Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inference #6

Open
codeghees opened this issue Jul 7, 2021 · 15 comments
Open

Inference #6

codeghees opened this issue Jul 7, 2021 · 15 comments

Comments

@codeghees
Copy link

Hi! Great work with this. Was able to reproduce your results I think.
@qute012
Two questions - what is the best way to run inference on the trained model? Any sample you have?
Secondly, I was getting an error on fine-tuning a model trained on Google speech commands to my Urdu dataset.
cfg = convert_namespace_to_omegaconf(state_dict['args'])
Error was a key error 'args' not found. What am I doing wrong?
Was passing the .pth model. I checked the model was being loaded.

Any help would be appreciated.

@codeghees
Copy link
Author

The test accuracy for 10 samples for each keyword is over 94 percent. Sounds too good to be true.

@BeardyMan37
Copy link

Hello, @codeghees. Could you please provide me with a requirement.txt file or a conda environment.yml file for the environment you used while reproducing the results? I tried to reproduce the results on the google speech v2 dataset and was faced with the same errors.

@dobby-seo
Copy link
Owner

Hi~ @codeghees @BeardyMan37

Thank for concerning this project. Truly, i can't afford to maintain this project and can't access server now also ; (
If i have time, i would prefer to develop this project for inferencing. But you guys can reproduce this project referring hyperparameters and model architecture.

Sorry 😐

@codeghees
Copy link
Author

can you point me to a direction for inference?

@codeghees
Copy link
Author

I can build it myself.

@BeardyMan37 I used Google Colab.

@dobby-seo
Copy link
Owner

dobby-seo commented Jul 8, 2021

@codeghees

  1. extract loudest section
    Most important for accuracy, because this model can get only 1 sec raw audio file. So you have to check out extracted signal contains voice actually.
def extract_loudest_section(self, wav, win_len=30):
        wav_len = len(wav)
        temp = abs(wav)

        st,et = 0,0
        max_dec = 0

        for ws in range(0, wav_len, win_len):
            cur_dec = temp[ws:ws+16000].sum()
            if cur_dec >= max_dec:
                max_dec = cur_dec
                st,et = ws, ws+16000
            if ws+16000 > wav_len:
                break

        return wav[st:et]
  1. post process (in fairseq)
    You don't need to normalize raw audio. And i think it works nothing, i just add it for Wav2Vec 2.0 pipeline. I'm not sure, but it doesn't matter to remove this function.
 def postprocess(self, feats, curr_sample_rate):
        if feats.dim() == 2:
            feats = feats.mean(-1)

        if curr_sample_rate != self.sample_rate:
            raise Exception(f"sample rate: {curr_sample_rate}, need {self.sample_rate}")

        assert feats.dim() == 1, feats.dim()

        if self.normalize:
            with torch.no_grad():
                feats = F.layer_norm(feats, feats.shape)
        return feats
  1. make single batch to feed to model.

  2. predict class from argmax of model output

@codeghees
Copy link
Author

Also how do we get which index represents which class i.e 0 for "UP" - is that positioning of the item in the index array?

@dobby-seo
Copy link
Owner

@codeghees

Yes, right! like simple classification other method :D

@codeghees
Copy link
Author

Oh I meant - how do we know the mapping. Does that come from the CLASSES array?

Thanks!

@dobby-seo
Copy link
Owner

dobby-seo commented Jul 8, 2021

Yes. If you can produce training environment, can you PR for others?

@codeghees
Copy link
Author

I will go back and check - I just opened colab and followed the instructions. - What is the exact error @BeardyMan37?

@BeardyMan37
Copy link

Managed to resolve it. @codeghees

@BeardyMan37
Copy link

BeardyMan37 commented Jul 8, 2021

@qute012 attaching both the requirement.txt and the environment.yml file for your reference.

@alirezafarashah
Copy link

Hi! Great work with this. Was able to reproduce your results I think. @qute012 Two questions - what is the best way to run inference on the trained model? Any sample you have? Secondly, I was getting an error on fine-tuning a model trained on Google speech commands to my Urdu dataset. cfg = convert_namespace_to_omegaconf(state_dict['args']) Error was a key error 'args' not found. What am I doing wrong? Was passing the .pth model. I checked the model was being loaded.

Any help would be appreciated.

Hi! Great work with this. Was able to reproduce your results I think. @qute012 Two questions - what is the best way to run inference on the trained model? Any sample you have? Secondly, I was getting an error on fine-tuning a model trained on Google speech commands to my Urdu dataset. cfg = convert_namespace_to_omegaconf(state_dict['args']) Error was a key error 'args' not found. What am I doing wrong? Was passing the .pth model. I checked the model was being loaded.

Any help would be appreciated.

hello @codeghees . I encountered the same error while trying to finetune a huggingface wav2vec model with fairseq. Have you found out a method to convert a huggingface model(.bin) to fairseq checkpoint(.pt)?

@salmaShahid
Copy link

@codeghees can you please guide me or give me the link of your colab file? I want to reproduce this result and apply same strategy on Urdu Language.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants