You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
threshold.Silence crashes if the batch dimension of the input tensor is > 1. It seems to be because loudness is calculated by converting tensors to numpy and using librosa. The line that causes the actual crash is line 37 from loudness.py, which tries to squeeze the 0th dimension.
Librosa doesn't support batches, but since calculating the loudness only involves STFT, logs and means, it doesn't seems hard change it to use the torch version of these functions.
Since the rest of the package (at least seems to) works fine with batched inputs, this might be a worthwhile change.
The text was updated successfully, but these errors were encountered:
Given a single audio file (batch size always 1), CREPE creates many batches via chunking. Then, once you have the entire pitch and periodicity sequence for the audio file, you decode and threshold as needed. But given that the input is of shape (1, samples) and not (batch, samples), thresholding and decoding should operate on the actual output of that batched process, which is (1, int(1 + samples // hopsize)). So you'll never have a batch size greater than one for the output pitch and periodicity, even if you use multiple audio files. All batching is done internally. And it's rare to pass in multiple audio files at once as (batch, samples) as it assumes that all items in the batch have the same number of samples.
What you are mentioning is useful for speed-ups--computing loudness on GPU could be nice for that. However, the loudness is extremely quick to compute, so it's not a priority of mine. And I have a solution coming soon that handles silence without that additional step =)
threshold.Silence crashes if the batch dimension of the input tensor is > 1. It seems to be because loudness is calculated by converting tensors to numpy and using librosa. The line that causes the actual crash is line 37 from loudness.py, which tries to squeeze the 0th dimension.
Librosa doesn't support batches, but since calculating the loudness only involves STFT, logs and means, it doesn't seems hard change it to use the torch version of these functions.
Since the rest of the package (at least seems to) works fine with batched inputs, this might be a worthwhile change.
The text was updated successfully, but these errors were encountered: