Releases · lhotse-speech/lhotse

25 Jun 15:59

pzelasko

v1.24.2

e76dc3c

v1.24.2 Latest

Latest

New recipes

Add KsponSpeech recipe by @whsqkaak in #1353

New features

Several new APIs for manifest classes added in #1361:

cut.iter_data() which iterates over (key, manifest) pairs of all data items attached to a given cut (e.g., ("recording", Recording(...)), ("custom_features", TemporalArray(...)))
is_in_memory property for all manifest types to indicate if it contains data that is held in memory
is_placeholder for non-cut manifests to indicate if a manifest is just a placeholder (has some metadata, but can't be used to load data)
cut.drop_in_memory_data() which converts manifests with in-memory data to placeholders (this is useful for manifests that live longer than just dataloading to avoid blowing up CPU memory and/or slowing down the program)

Bug fixes

Restoring smart open for local files if available by @pzelasko in #1360
Fix Recording.to_dict() when transforms are dicts and transform pickling issues by @pzelasko in #1355
Utils for discovering attached data and dropping in-memory data by @pzelasko in #1361
Numpy 2.0 compatibility by @pzelasko in #1362

New Contributors

@whsqkaak made their first contribution in #1353

Full Changelog: v1.24.1...v1.24.2

Contributors

pzelasko and whsqkaak

Assets 2

10 Jun 20:35

pzelasko

v1.24.1

866e4a8

v1.24.1

What's Changed

Support for reading data from AIStore using Python SDK by @pzelasko in #1354

Full Changelog: v1.24...v1.24.1

Contributors

pzelasko

Assets 2

05 Jun 19:59

pzelasko

v1.24

4d57d53

v1.24 - The World's Highest Wingsuit Jump

What's Changed

New features

Notably, there's a new optimization for dynamic bucketing sampler in multi-GPU training - it will choose the same (or the closest possible) bucket on each DDP rank to keep the total training step times closer. The expected speedup is dependent on the model and the number of GPUs. We observed 8 and 13% speedups across two experiments compared to non-synchronized bucket selection. The new option is called sync_buckets and is enabled by default.

Dynamic bucket selection RNG sync by @pzelasko in #1341
Add new sampler: weighted sampler by @marcoyang1998 in #1344
reverb_rir: support Cut input and in memory data by @pzelasko in #1332

Recipes

Add the ReazonSpeech recipe by @Triplecq in #1330

Other improvements

Missing 'subset' parameter by @daniel-dona in #1336
Fix describe on cuts by @keeofkoo in #1340
Use libsndfile in recording chunk dataset by @pzelasko in #1335
Fix librispeech manifest caching by @haerski in #1343
Fix one-off edge case in split_lazy by @pzelasko in #1347
Increase the start diff tolerance for feature loading by @pzelasko in #1349
More test coverage for lhotse subset by @pzelasko in #1345

New Contributors

@keeofkoo made their first contribution in #1340
@haerski made their first contribution in #1343
@Triplecq made their first contribution in #1330

Full Changelog: v1.23...v1.24

Contributors

Triplecq, haerski, and 4 other contributors

Assets 2

30 Apr 18:43

pzelasko

v1.23

b2dce78

v1.23 - Snowdrop

What's Changed

Recipes

MDCC recipe by @JinZr in #1302
Updated text_norm for aishell recipe by @JinZr in #1305
Allow skipping missing files in AMI download by @pzelasko in #1318
Add Chinese TTS dataset baker. by @csukuangfj in #1304
In CommonVoice corpus, use .tsv headers to parse and not column index by @daniel-dona in #1328

Fixes to a regression in noise mixing augmentations

Enhance CutSet.mix() randomness and data utilization by @pzelasko in #1315
Fix randomness in CutMix transform by @pzelasko in #1316
select a random sub-region of the noise based on the delta duration by @osadj in #1317

Other improvements

Add dataset for audio tagging by @marcoyang1998 in #1241
Fix _get_strided_batch device by @lifeiteng in #1303
Fix typo in README.md by @yfyeung in #1308
Fix export of features/array to shar by @pzelasko in #1323
Fix trim_to_supervision_groups by @pzelasko in #1322

New Contributors

@daniel-dona made their first contribution in #1328

Full Changelog: v1.22...v1.23

Contributors

lifeiteng, csukuangfj, and 6 other contributors

Assets 2

07 Mar 19:38

pzelasko

v1.22

d26d476

v1.22 - Sherpa's Paradise

What's Changed

New features

Extending Lhotse dataloading to text/multimodal data by @pzelasko in #1295

As an experimental feature, we are extending the API of Lhotse samplers to enable key sampling features for non-audio data such as text. That means text (and other) data can be dynamically multiplexed and bucketed in the same way as audio data with some lightweight wrappers. Please refer to new documentation here: https://lhotse.readthedocs.io/en/latest/datasets.html#customizing-sampling-constraints

Multi-channel support improvements
- Fix loading multi-channel custom recording fields in multi cuts by @pzelasko in #1298
- Channel selection for multi-channel custom recording fields by @pzelasko in #1299

Lhotse MultiCuts:

are now exportable into Lhotse Shar format
gained a new method cut = cut.with_channels([0, 1, ...]) to modify the channels they refer to
can have multi-channel custom Recordings with channels selectable via a special custom key (e.g., if defining cut.target_recording, audio can be read via cut.load_target_recording() and channels will be auto-selected by looking up cut.target_recording_channel_selector).

Recipes

Add new recipe: speechio by @yuekaizhang in #1297
tedlium2 recipe by @JinZr in #1296

Other improvements

Use audio backends and export custom fields in Lhotse Shar by @pzelasko in #1290
Documentation for random seeds in lhotse + extended support of lazy r… by @pzelasko in #1291
Cutconcat fixed max duration by @swigls in #1292
Fix feature_dim of Spectrogram extractors. by @csukuangfj in #1294
fix whisper for multi-channel data by @yuekaizhang in #1289
Xfail flaky SileroVAD tests by @pzelasko in #1300

New Contributors

@swigls made their first contribution in #1292

Full Changelog: v1.21...v1.22

Contributors

csukuangfj, swigls, and 3 other contributors

Assets 2

13 Feb 19:57

pzelasko

v1.21

769c273

v1.21 - Glaciology

What's Changed

This release patches lhotse to handle cases when libsox is not available for torchaudio. The audio backend code went through additional round of refactoring, and libsndfile is now preferred as a default since it showed faster audio decoding performance in our testing. Going forward, when LHOTSE_AUDIO_BACKEND is set, we will use the same backend for audio loading, audio saving, and reading audio metadata (if possible). This release also adds support for Python 3.12 and PyTorch 2.2.

Add VAD to Supervisions in LibriLight Recipe by @yfyeung in #1280
Fixes for manifest validation and fixing by @pzelasko in #1284
Handle error with cachedir creation gracefully by @pzelasko in #1287
AudioBackend specific save_audio and info, managing missing SoX in torchaudio, Python 3.12 / PyTorch 2.2 support, using libsndfile as preferred audio backend by @pzelasko in #1288

Full Changelog: v1.20...v1.21

Contributors

pzelasko and yfyeung

Assets 2

31 Jan 20:51

pzelasko

v1.20

455b20e

v1.20 - Pining for the Fjords

What's Changed

New features

Extended the subset of lhotse that works without installing torchaudio by @pzelasko in #1253 #1255
Ensure drop_last=False always returns an equal number of mini-batches by re-distributing and/or duplicating some data by @pzelasko in #1277
Improved CPU memory usage and shuffling + bucketing in DynamicBucketingSampler by @pzelasko in #1276
Enable seed randomization in dynamic samplers by @pzelasko in #1278

Recipes

Fluent Speech Commands dataset, SLU task by @HSTEHSTEHSTE in #1272

Other improvements

Update docs with env vars used by Lhotse by @pzelasko in #1252
support whisper large v3; deepspeed launcher rank world_size setting by @yuekaizhang in #1260
Fix non-deterministic tests by @pzelasko in #1261
Fix duplication issues in CutSet.mix() by @pzelasko in #1268
Support controllable CutSet.mux weights in multiprocess dataloading by @pzelasko in #1266
Fix distributed sampler initialization and exceeded sampler warning false positives by @pzelasko in #1270
Install kaldi-native-io explicitly in the kaldi doc example. by @csukuangfj in #1275
Allow duplicate cut IDs in a CutSet (CutSet is list-like instead of dict-like) by @pzelasko in #1279

New Contributors

@HSTEHSTEHSTE made their first contribution in #1272

Full Changelog: v1.19...v1.20

Contributors

csukuangfj, pzelasko, and 2 other contributors

Assets 2

02 Jan 14:58

pzelasko

v1.19

3e53b68

v1.19 - The Iceberger

What's Changed

Features

Support for OPUS encoding in Lhotse Shar format by @pzelasko in #1238
Perform CutSet.mix() lazily by @pzelasko in #1244
CutSampler.map() for transforming CutSet mini-batches by @pzelasko in #1246
Support multiplexing with a limited number of open streams by @pzelasko in #1248

Recipes

support icmc eval track 1 by @yuekaizhang in #1235
updating the voxpopuli recipe by @vesis84 in #1243
Allowing downloading Edin. ver. of VCTK by @JinZr in #1247

Other improvements

Micro-optimization for LazyJsonlIterator len() by @pzelasko in #1237
Drop python3.7 support by @pzelasko in #1245
Fix normalize_loudness for MixedCuts with PaddingCuts by @pzelasko in #1249

Full Changelog: v1.18...v1.19

Contributors

KarelVesely84, pzelasko, and 2 other contributors

Assets 2

11 Dec 14:10

pzelasko

v1.18

78b3a12

v1.18 - The Ice Age

What's Changed

New features

MMS forced alignment backend by @flyingleafe in #1185
Two new options: CutSet.from_shar(seed="trng") and DynamicCutSampler(quadratic_duration=...) by @pzelasko in #1199
Faster initialization option in DynamicBucketingSampler + various fixes by @pzelasko in #1210
CLI to estimate and print bucket bins for a cut set by @pzelasko in #1214
More flexible setting of audio backends by @pzelasko in #1219

Recipes

Add recipe for Medical Corpus by @yfyeung in #1212
minor fix for the AMI recipe by @JinZr in #1178
fixes compatibility with Edin. ver. VCTK dataset by @JinZr in #1182
Minor bug fix for eval2000 recipe by @JinZr in #1127
support far field data for icmcasr challenge by @yuekaizhang in #1189
fixed text norm for tal_csasr by @JinZr in #1198 #1213

Other improvements

MixedCut.truncate: fix the case when only PaddingCuts are left by @flyingleafe in #1157
Fix some potential problems in OPUS file reading by @yangb05 in #1181
fix an issue where 404 exception leaves 0 byte placeholder by @JinZr in #1190
Prevent accidental renaming when using with_suffix by @chiiyeh in #1192
Fix shar export for num_jobs>1 and recordings with transforms by @pzelasko in #1196
fix speaker error by @yzmyyff in #1197
Fix for trim_to_alignments issue by @desh2608 in #1193
Add deterministic_rng to more flaky tests by @pzelasko in #1200
update_recipes by @vesis84 in #1208
SpeechSynthesisDataset returns speaker_ids by @JinZr in #1206
Fix audio backend selection by @pzelasko in #1216
save sdm files into a single mdm file to do gss by @yuekaizhang in #1221
Modify SpeechSynthesisDataset class, make it return text by @yaozengwei in #1205
Allow lhotse installation without torchaudio for a limited set of features by @pzelasko in #1231
Use attacut module for Thai word tokenization (in MMS forced alignment) by @flyingleafe in #1232

New Contributors

@yangb05 made their first contribution in #1181
@chiiyeh made their first contribution in #1192
@yzmyyff made their first contribution in #1197
@yaozengwei made their first contribution in #1205

Full Changelog: v1.17...v1.18

Contributors

flyingleafe, desh2608, and 9 other contributors

Assets 2

08 Oct 23:31

pzelasko

v1.17

9c80a1e

v1.17 - Swirling Ice Pick

What's Changed

New supported datasets

Speech to text translation utilizing 3-way data by @AmirHussein96 in #1099
"This American Life" dataset recipe by @flyingleafe in #1140
Add VoxConverse recipe by @flyingleafe in #1142
Add recipe for ICASSP2024 ICMC-ASR Grand Challenge by @yfyeung in #1172

New features

Initial support for video by @pzelasko in #1151
copy_data: copy CutSet + its data to a new location by @pzelasko in #1130
Add whisper feature extractor by @yuekaizhang in #1159
VAD workflow with Silero by @rilshok in #1160

Enhancements and fixes

Fix feature extraction for lhotse shar CLI by @pzelasko in #1123
Add m4a to special cases for num samples determination by @pzelasko in #1124
making the kaldi import more robust by @vesis84 in #1129
Tutorial materials in main readme page by @pzelasko in #1133
optimize save_audios() by @vesis84 in #1131
Fix bugs in resumable_download by @flyingleafe in #1135
Arxiv badge by @desh2608 in #1136
Fix docs build by @pzelasko in #1137
Fix failing tests after repairing docs build by @pzelasko in #1138
Remove deprecated code, make minor cleanups by @pzelasko in #1139
Enforce deterministic RNG behavior in repeatedly flaky tests by @pzelasko in #1143
Refactor audio.py into smaller modules by @pzelasko in #1144
Fix broken save_audio by @flyingleafe in #1147
Optimize cut_into_windows for long cuts by @flyingleafe in #1150
Fixes for #1152 #1153 and #1154 by @pzelasko in #1156
fix bugs in downloading voxpopuli corpus by @DongjiGao in #1165
Support export_to_kaldi on resampled recordings by @sih4sing5hong5 in #1162
Refactor CutSet.describe to enable parallel statistics computation by @pzelasko in #1168
Allow dashes in feat CLI by @desh2608 in #1169
Apply deterministic RNG to more unit tests by @pzelasko in #1173
Add fix_manifests in all recipes by @desh2608 in #1128
Fix small bug in eval2000 by @desh2608 in #1126
Fix download in LibriCSS recipe by @desh2608 in #1148

New Contributors

@sih4sing5hong5 made their first contribution in #1162
@rilshok made their first contribution in #1160

Full Changelog: v1.16...v1.17

Contributors

flyingleafe, desh2608, and 8 other contributors

Assets 2

Releases: lhotse-speech/lhotse

v1.24.2

New recipes

New features

Bug fixes

New Contributors

Contributors

v1.24.1

What's Changed

Contributors

v1.24 - The World's Highest Wingsuit Jump

What's Changed

New features

Recipes

Other improvements

New Contributors

Contributors

v1.23 - Snowdrop

What's Changed

Recipes

Fixes to a regression in noise mixing augmentations

Other improvements

New Contributors

Contributors

v1.22 - Sherpa's Paradise

What's Changed

New features

Recipes

Other improvements

New Contributors

Contributors

v1.21 - Glaciology

What's Changed

Contributors

v1.20 - Pining for the Fjords

What's Changed

New features

Recipes

Other improvements

New Contributors

Contributors

v1.19 - The Iceberger

What's Changed

Features

Recipes

Other improvements

Contributors

v1.18 - The Ice Age

What's Changed

New features

Recipes

Other improvements

New Contributors

Contributors

v1.17 - Swirling Ice Pick

What's Changed

New supported datasets

New features

Enhancements and fixes

New Contributors

Contributors