-
Notifications
You must be signed in to change notification settings - Fork 209
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add .narrowband()
effect (mulaw, lpc10 codecs)
#1348
Add .narrowband()
effect (mulaw, lpc10 codecs)
#1348
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks pretty good! I left a few comments. Could you also add unit tests for this transform?
lhotse/augmentation/torchaudio.py
Outdated
|
||
|
||
@dataclass | ||
class Phone(AudioTransform): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest calling it Narrowband
and renaming the methods to narrowband
. Also upsampling back to original SR should be optional (restore_orig_sr=True
).
lhotse/augmentation/torchaudio.py
Outdated
Resample input audio to 8000 Hz, apply codec (encode then immediately decode), then resample back to the original sampling rate. | ||
""" | ||
|
||
source_sampling_rate: int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We shouldn't need this option at all. You can call get_or_create_resampler
directly in __call__
using the input example's actual sampling rate. This way this transform can work with datasets of mixed sampling rates.
2353c82
to
74b9abd
Compare
I have addressed everything except for |
You are very close! Add a parameter |
Done, but something extra is needed, because when I apply the transformation with AudioLoadingError: The number of declared samples in the recording diverged from the one obtained when loading audio (offset=0, duration=19.22419501133787). This could be internal Lhotse's error or a faulty transform implementation. Please report this issue in Lhotse and show the following: diff=693887, audio.shape=(1, 153900), recording=Recording(id='0_nb_lpc10', sources=[AudioSource(type='file', channels=[0], source='/home/user/workspace/rtvalid/0.wav')], sampling_rate=44100, num_samples=847787, duration=19.22419501133787, channel_ids=[0], transforms=[{'name': 'Narrowband', 'kwargs': {'codec': 'lpc10', 'restore_orig_sr': False}}]) |
If you don't restore orig sr, you'll have to update both sampling_rate and num_samples property on the Recording object. |
Thanks for the contribution, merging! |
.narrowband()
effect (mulaw, lpc10 codecs)
This patch adds a audio codec transformation.
I have found that when applying K2 ASR to speech compressed with mulaw, it is advantageous to augment the training data with these codecs. The transformation resamples the input audio to 8kHz, encodes then decodes using specified codec, then restores the original sample rate (e.g. 16 kHz).
Open issues:
phone()
. But maybe a better name is needed?Example use:
libspandsp is required to use the lpc10 codec. Use apt-get install libspandsp-dev on Debian/Ubuntu.