We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hi team,
Thank you very much for releasing this model!
I'm curious about training/inference with WavLM to improve performance.
Running inference with WavLM-Base throws this error:
% python convert.py --hpfile configs/freevc.json --ptfile checkpoints/freevc.pt --txtpath convert.txt --outdir outputs/freevc Loading model... Loading checkpoint... INFO:root:Loaded checkpoint 'checkpoints/freevc.pt' (iteration 1372) Loading WavLM for content... {} <wavlm.WavLM.WavLMConfig object at 0x140b2ea50> INFO:wavlm.WavLM:WavLM Config: {'extractor_mode': 'default', 'encoder_layers': 12, 'encoder_embed_dim': 768, 'encoder_ffn_embed_dim': 3072, 'encoder_attention_heads': 12, 'activation_fn': 'gelu', 'layer_norm_first': False, 'conv_feature_layers': '[(512,10,5)] + [(512,3,2)] * 4 + [(512,2,2)] * 2', 'conv_bias': False, 'feature_grad_mult': 0.1, 'normalize': False, 'dropout': 0.1, 'attention_dropout': 0.1, 'activation_dropout': 0.0, 'encoder_layerdrop': 0.05, 'dropout_input': 0.1, 'dropout_features': 0.1, 'mask_length': 10, 'mask_prob': 0.8, 'mask_selection': 'static', 'mask_other': 0.0, 'no_mask_overlap': False, 'mask_min_space': 1, 'mask_channel_length': 10, 'mask_channel_prob': 0.0, 'mask_channel_selection': 'static', 'mask_channel_other': 0.0, 'no_mask_channel_overlap': False, 'mask_channel_min_space': 1, 'conv_pos': 128, 'conv_pos_groups': 16, 'relative_position_embedding': True, 'num_buckets': 320, 'max_distance': 800, 'gru_rel_pos': True, 'expand_attention_head_size': -1} Loading speaker encoder... Loaded the voice encoder model on cpu in 0.01 seconds. Processing text... Synthesizing... 0it [00:00, ?it/s]/Users/macos/.pyenv/versions/3.7.10/lib/python3.7/site-packages/librosa/effects.py:490: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use `arr[tuple(seq)]` instead of `arr[seq]`. In the future this will be interpreted as an array index, `arr[np.array(seq)]`, which will result either in an error or a different result. return y[full_index], np.asarray([start, end]) 0it [00:06, ?it/s] Traceback (most recent call last): File "convert.py", line 83, in <module> audio = net_g.infer(c, g=g_tgt) File "[...]/freevc/FreeVC/models.py", line 347, in infer z_p, m_p, logs_p, c_mask = self.enc_p(c, c_lengths) File "/Users/macos/.pyenv/versions/3.7.10/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "[...]/freevc/FreeVC/models.py", line 72, in forward x = self.pre(x) * x_mask File "/Users/macos/.pyenv/versions/3.7.10/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "/Users/macos/.pyenv/versions/3.7.10/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 298, in forward return self._conv_forward(input, self.weight, self.bias) File "/Users/macos/.pyenv/versions/3.7.10/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 295, in _conv_forward self.padding, self.dilation, self.groups) RuntimeError: Given groups=1, weight of size [192, 1024, 1], expected input[1, 768, 1030] to have 1024 channels, but got 768 channels instead
So my guess is that we need to train with WavLM-Base first, and then change the config file ./configs/freecv.json to run inference?
./configs/freecv.json
Thank you in advance.
Regards, KA
The text was updated successfully, but these errors were encountered:
@khacanh I think you ssl_dim 1030 for WavLm Base. reshape the dimension into [768, 1030, 1]
Sorry, something went wrong.
No branches or pull requests
Hi team,
Thank you very much for releasing this model!
I'm curious about training/inference with WavLM to improve performance.
Running inference with WavLM-Base throws this error:
So my guess is that we need to train with WavLM-Base first, and then change the config file
./configs/freecv.json
to run inference?Thank you in advance.
Regards,
KA
The text was updated successfully, but these errors were encountered: