You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Loading AudioSR: speech
Loading model on cuda
D:\Soft\Python\Python38\lib\site-packages\torch\functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\TensorShape.cpp:3484.)
return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined]
D:\Soft\Python\Python38\lib\site-packages\torchaudio\transforms_transforms.py:611: UserWarning: Argument 'onesided' has been deprecated and has no influence on the behavior of this module.
warnings.warn(
DiffusionWrapper has 258.20 M params.
Running DDIM Sampling with 50 timesteps
DDIM Sampler: 0%| | 0/50 [00:05<?, ?it/s]
Traceback (most recent call last):
File "D:\Soft\Python\Python38\lib\runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "D:\Soft\Python\Python38\lib\runpy.py", line 87, in run_code
exec(code, run_globals)
File "D:\Soft\audio_super_resolution\versatile_audio_super_resolution\audiosr_main.py", line 115, in
waveform = super_resolution(
File "D:\Soft\audio_super_resolution\versatile_audio_super_resolution\audiosr\pipeline.py", line 168, in super_resolution
waveform = latent_diffusion.generate_batch(
File "D:\Soft\Python\Python38\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "D:\Soft\audio_super_resolution\versatile_audio_super_resolution\audiosr\latent_diffusion\models\ddpm.py", line 1525, in generate_batch
samples, _ = self.sample_log(
File "D:\Soft\Python\Python38\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "D:\Soft\audio_super_resolution\versatile_audio_super_resolution\audiosr\latent_diffusion\models\ddpm.py", line 1431, in sample_log
samples, intermediates = ddim_sampler.sample(
File "D:\Soft\Python\Python38\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "D:\Soft\audio_super_resolution\versatile_audio_super_resolution\audiosr\latent_diffusion\models\ddim.py", line 143, in sample
samples, intermediates = self.ddim_sampling(
File "D:\Soft\Python\Python38\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "D:\Soft\audio_super_resolution\versatile_audio_super_resolution\audiosr\latent_diffusion\models\ddim.py", line 237, in ddim_sampling
outs = self.p_sample_ddim(
File "D:\Soft\Python\Python38\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "D:\Soft\audio_super_resolution\versatile_audio_super_resolution\audiosr\latent_diffusion\models\ddim.py", line 293, in p_sample_ddim
model_t = self.model.apply_model(x_in, t_in, c)
File "D:\Soft\audio_super_resolution\versatile_audio_super_resolution\audiosr\latent_diffusion\models\ddpm.py", line 1030, in apply_model
x_recon = self.model(x_noisy, t, cond_dict=cond)
File "D:\Soft\Python\Python38\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "D:\Soft\audio_super_resolution\versatile_audio_super_resolution\audiosr\latent_diffusion\models\ddpm.py", line 1686, in forward
out = self.diffusion_model(
File "D:\Soft\Python\Python38\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "D:\Soft\audio_super_resolution\versatile_audio_super_resolution\audiosr\latent_diffusion\modules\diffusionmodules\openaimodel.py", line 879, in forward
h = th.cat([h, concate_tensor], dim=1)
RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 64 but got size 63 for tensor number 1 in the list.
Could be related to the length of the audio, just 0.936 seconds, I used Losslesscut to append another mp3 file with same exact configuration without reencoding, and output it with all the same settings - same sample rate, bitrate etc. and then run audiosr with that and got no error.
The text was updated successfully, but these errors were encountered:
Ran into this as well. It seems to be related to the audio file being too short. If you pad the input audio array with some trailing zeros it should function.
I'm using the latest master and running on CUDA.
Here's the mp3 file I'm using as input:
https://drive.google.com/file/d/1xR2mV-SctUknIvjKqlTYyFKHRl5annCX/view?usp=sharing
command line:
python -m audiosr -i 5.01_22303.037073170733_23517.438009756097.mp3 -s . -d cuda
getting this error:
Could be related to the length of the audio, just 0.936 seconds, I used Losslesscut to append another mp3 file with same exact configuration without reencoding, and output it with all the same settings - same sample rate, bitrate etc. and then run audiosr with that and got no error.
The text was updated successfully, but these errors were encountered: