Add `.narrowband()` effect (mulaw, lpc10 codecs) #1348

rouseabout · 2024-06-03T01:05:51Z

This patch adds a audio codec transformation.

I have found that when applying K2 ASR to speech compressed with mulaw, it is advantageous to augment the training data with these codecs. The transformation resamples the input audio to 8kHz, encodes then decodes using specified codec, then restores the original sample rate (e.g. 16 kHz).

Open issues:

The transformation is called phone(). But maybe a better name is needed?
Since it significantly alters the audio, depending on codec, I am wondering how best to test the transformation?

Example use:

cs2 = CutSet.from_manifests(...).phone(codec="mulaw")
cs3 = CutSet.from_manifests(...).phone(codec="lpc10")

libspandsp is required to use the lpc10 codec. Use apt-get install libspandsp-dev on Debian/Ubuntu.

pzelasko

Looks pretty good! I left a few comments. Could you also add unit tests for this transform?

pzelasko · 2024-06-04T14:34:01Z

lhotse/augmentation/torchaudio.py

+
+
+@dataclass
+class Phone(AudioTransform):


I suggest calling it Narrowband and renaming the methods to narrowband. Also upsampling back to original SR should be optional (restore_orig_sr=True).

pzelasko · 2024-06-04T14:34:59Z

lhotse/augmentation/torchaudio.py

+    Resample input audio to 8000 Hz, apply codec (encode then immediately decode), then resample back to the original sampling rate.
+    """
+
+    source_sampling_rate: int


We shouldn't need this option at all. You can call get_or_create_resampler directly in __call__ using the input example's actual sampling rate. This way this transform can work with datasets of mixed sampling rates.

lhotse/augmentation/torchaudio.py

rouseabout · 2024-06-06T00:43:11Z

I have addressed everything except for restore_orig_sr=True. I am not sure how to achieve that!

pzelasko · 2024-06-09T02:01:41Z

I have addressed everything except for restore_orig_sr=True. I am not sure how to achieve that!

You are very close! Add a parameter restore_orig_sr=True in def narrowband(self, ...) for cut and recording, and pass the provided argument to Narrowband constructor. Then you can extend the condition for the second resampling to if self.restore_orig_sr and sampling_rate != 8000).

rouseabout · 2024-06-24T22:33:48Z

Done, but something extra is needed, because when I apply the transformation with use_orig_sr=False the following exception occurs:

AudioLoadingError: The number of declared samples in the recording diverged from the one obtained when loading audio (offset=0, duration=19.22419501133787). This could be internal Lhotse's error or a faulty transform implementation. Please report this issue in Lhotse and show the following: diff=693887, audio.shape=(1, 153900), recording=Recording(id='0_nb_lpc10', sources=[AudioSource(type='file', channels=[0], source='/home/user/workspace/rtvalid/0.wav')], sampling_rate=44100, num_samples=847787, duration=19.22419501133787, channel_ids=[0], transforms=[{'name': 'Narrowband', 'kwargs': {'codec': 'lpc10', 'restore_orig_sr': False}}])

pzelasko · 2024-06-24T22:46:39Z

If you don't restore orig sr, you'll have to update both sampling_rate and num_samples property on the Recording object.

pzelasko · 2024-07-18T23:38:27Z

Thanks for the contribution, merging!

pzelasko reviewed Jun 4, 2024

View reviewed changes

augmentation/torchaudio: add Phone effect (mulaw, lpc10 codecs)

74b9abd

rouseabout force-pushed the codec-transformation branch from 2353c82 to 74b9abd Compare June 6, 2024 00:41

rouseabout and others added 2 commits June 25, 2024 09:42

restore_orig_sr option

cec4624

Merge branch 'master' into codec-transformation

c9ac1f7

pzelasko added this to the v1.25.0 milestone Jul 18, 2024

Merge branch 'master' into codec-transformation

bdd5e55

pzelasko merged commit 18436e9 into lhotse-speech:master Jul 18, 2024
9 of 11 checks passed

pzelasko changed the title ~~augmentation/torchaudio: add Phone effect (mulaw, lpc10 codecs)~~ Add .narrowband() effect (mulaw, lpc10 codecs) Jul 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `.narrowband()` effect (mulaw, lpc10 codecs) #1348

Add `.narrowband()` effect (mulaw, lpc10 codecs) #1348

rouseabout commented Jun 3, 2024

pzelasko left a comment

pzelasko Jun 4, 2024

pzelasko Jun 4, 2024

rouseabout commented Jun 6, 2024

pzelasko commented Jun 9, 2024

rouseabout commented Jun 24, 2024

pzelasko commented Jun 24, 2024

pzelasko commented Jul 18, 2024



		@dataclass
		class Phone(AudioTransform):

Add .narrowband() effect (mulaw, lpc10 codecs) #1348

Add .narrowband() effect (mulaw, lpc10 codecs) #1348

Conversation

rouseabout commented Jun 3, 2024

pzelasko left a comment

Choose a reason for hiding this comment

pzelasko Jun 4, 2024

Choose a reason for hiding this comment

pzelasko Jun 4, 2024

Choose a reason for hiding this comment

rouseabout commented Jun 6, 2024

pzelasko commented Jun 9, 2024

rouseabout commented Jun 24, 2024

pzelasko commented Jun 24, 2024

pzelasko commented Jul 18, 2024

Add `.narrowband()` effect (mulaw, lpc10 codecs) #1348

Add `.narrowband()` effect (mulaw, lpc10 codecs) #1348