Releases: lhotse-speech/lhotse
v1.24.2
New recipes
New features
Several new APIs for manifest classes added in #1361:
cut.iter_data()
which iterates over (key, manifest) pairs of all data items attached to a given cut (e.g.,("recording", Recording(...)), ("custom_features", TemporalArray(...))
)is_in_memory
property for all manifest types to indicate if it contains data that is held in memoryis_placeholder
for non-cut manifests to indicate if a manifest is just a placeholder (has some metadata, but can't be used to load data)cut.drop_in_memory_data()
which converts manifests with in-memory data to placeholders (this is useful for manifests that live longer than just dataloading to avoid blowing up CPU memory and/or slowing down the program)
Bug fixes
- Restoring smart open for local files if available by @pzelasko in #1360
- Fix Recording.to_dict() when transforms are dicts and transform pickling issues by @pzelasko in #1355
- Utils for discovering attached data and dropping in-memory data by @pzelasko in #1361
- Numpy 2.0 compatibility by @pzelasko in #1362
New Contributors
Full Changelog: v1.24.1...v1.24.2
v1.24.1
v1.24 - The World's Highest Wingsuit Jump
What's Changed
New features
Notably, there's a new optimization for dynamic bucketing sampler in multi-GPU training - it will choose the same (or the closest possible) bucket on each DDP rank to keep the total training step times closer. The expected speedup is dependent on the model and the number of GPUs. We observed 8 and 13% speedups across two experiments compared to non-synchronized bucket selection. The new option is called sync_buckets
and is enabled by default.
- Dynamic bucket selection RNG sync by @pzelasko in #1341
- Add new sampler: weighted sampler by @marcoyang1998 in #1344
reverb_rir
: support Cut input and in memory data by @pzelasko in #1332
Recipes
Other improvements
- Missing 'subset' parameter by @daniel-dona in #1336
- Fix describe on cuts by @keeofkoo in #1340
- Use libsndfile in recording chunk dataset by @pzelasko in #1335
- Fix librispeech manifest caching by @haerski in #1343
- Fix one-off edge case in split_lazy by @pzelasko in #1347
- Increase the start diff tolerance for feature loading by @pzelasko in #1349
- More test coverage for lhotse subset by @pzelasko in #1345
New Contributors
- @keeofkoo made their first contribution in #1340
- @haerski made their first contribution in #1343
- @Triplecq made their first contribution in #1330
Full Changelog: v1.23...v1.24
v1.23 - Snowdrop
What's Changed
Recipes
- MDCC recipe by @JinZr in #1302
- Updated text_norm for
aishell
recipe by @JinZr in #1305 - Allow skipping missing files in AMI download by @pzelasko in #1318
- Add Chinese TTS dataset
baker
. by @csukuangfj in #1304 - In CommonVoice corpus, use .tsv headers to parse and not column index by @daniel-dona in #1328
Fixes to a regression in noise mixing augmentations
- Enhance
CutSet.mix()
randomness and data utilization by @pzelasko in #1315 - Fix randomness in CutMix transform by @pzelasko in #1316
- select a random sub-region of the noise based on the delta duration by @osadj in #1317
Other improvements
- Add dataset for audio tagging by @marcoyang1998 in #1241
- Fix _get_strided_batch device by @lifeiteng in #1303
- Fix typo in README.md by @yfyeung in #1308
- Fix export of features/array to shar by @pzelasko in #1323
- Fix
trim_to_supervision_groups
by @pzelasko in #1322
New Contributors
- @daniel-dona made their first contribution in #1328
Full Changelog: v1.22...v1.23
v1.22 - Sherpa's Paradise
What's Changed
New features
As an experimental feature, we are extending the API of Lhotse samplers to enable key sampling features for non-audio data such as text. That means text (and other) data can be dynamically multiplexed and bucketed in the same way as audio data with some lightweight wrappers. Please refer to new documentation here: https://lhotse.readthedocs.io/en/latest/datasets.html#customizing-sampling-constraints
- Multi-channel support improvements
Lhotse MultiCut
s:
- are now exportable into Lhotse Shar format
- gained a new method
cut = cut.with_channels([0, 1, ...])
to modify the channels they refer to - can have multi-channel custom Recordings with channels selectable via a special custom key (e.g., if defining
cut.target_recording
, audio can be read viacut.load_target_recording()
and channels will be auto-selected by looking upcut.target_recording_channel_selector
).
Recipes
- Add new recipe: speechio by @yuekaizhang in #1297
- tedlium2 recipe by @JinZr in #1296
Other improvements
- Use audio backends and export custom fields in Lhotse Shar by @pzelasko in #1290
- Documentation for random seeds in lhotse + extended support of lazy r… by @pzelasko in #1291
- Cutconcat fixed max duration by @swigls in #1292
- Fix feature_dim of Spectrogram extractors. by @csukuangfj in #1294
- fix whisper for multi-channel data by @yuekaizhang in #1289
- Xfail flaky SileroVAD tests by @pzelasko in #1300
New Contributors
Full Changelog: v1.21...v1.22
v1.21 - Glaciology
What's Changed
This release patches lhotse to handle cases when libsox is not available for torchaudio. The audio backend code went through additional round of refactoring, and libsndfile
is now preferred as a default since it showed faster audio decoding performance in our testing. Going forward, when LHOTSE_AUDIO_BACKEND
is set, we will use the same backend for audio loading, audio saving, and reading audio metadata (if possible). This release also adds support for Python 3.12 and PyTorch 2.2.
- Add VAD to Supervisions in LibriLight Recipe by @yfyeung in #1280
- Fixes for manifest validation and fixing by @pzelasko in #1284
- Handle error with cachedir creation gracefully by @pzelasko in #1287
AudioBackend
specificsave_audio
andinfo
, managing missing SoX in torchaudio, Python 3.12 / PyTorch 2.2 support, usinglibsndfile
as preferred audio backend by @pzelasko in #1288
Full Changelog: v1.20...v1.21
v1.20 - Pining for the Fjords
What's Changed
New features
- Extended the subset of lhotse that works without installing torchaudio by @pzelasko in #1253 #1255
- Ensure
drop_last=False
always returns an equal number of mini-batches by re-distributing and/or duplicating some data by @pzelasko in #1277 - Improved CPU memory usage and shuffling + bucketing in
DynamicBucketingSampler
by @pzelasko in #1276 - Enable seed randomization in dynamic samplers by @pzelasko in #1278
Recipes
- Fluent Speech Commands dataset, SLU task by @HSTEHSTEHSTE in #1272
Other improvements
- Update docs with env vars used by Lhotse by @pzelasko in #1252
- support whisper large v3; deepspeed launcher rank world_size setting by @yuekaizhang in #1260
- Fix non-deterministic tests by @pzelasko in #1261
- Fix duplication issues in CutSet.mix() by @pzelasko in #1268
- Support controllable
CutSet.mux
weights in multiprocess dataloading by @pzelasko in #1266 - Fix distributed sampler initialization and
exceeded
sampler warning false positives by @pzelasko in #1270 - Install kaldi-native-io explicitly in the kaldi doc example. by @csukuangfj in #1275
- Allow duplicate cut IDs in a CutSet (CutSet is list-like instead of dict-like) by @pzelasko in #1279
New Contributors
- @HSTEHSTEHSTE made their first contribution in #1272
Full Changelog: v1.19...v1.20
v1.19 - The Iceberger
What's Changed
Features
- Support for OPUS encoding in Lhotse Shar format by @pzelasko in #1238
- Perform CutSet.mix() lazily by @pzelasko in #1244
CutSampler.map()
for transformingCutSet
mini-batches by @pzelasko in #1246- Support multiplexing with a limited number of open streams by @pzelasko in #1248
Recipes
- support icmc eval track 1 by @yuekaizhang in #1235
- updating the voxpopuli recipe by @vesis84 in #1243
- Allowing downloading Edin. ver. of VCTK by @JinZr in #1247
Other improvements
- Micro-optimization for LazyJsonlIterator len() by @pzelasko in #1237
- Drop python3.7 support by @pzelasko in #1245
- Fix
normalize_loudness
for MixedCuts with PaddingCuts by @pzelasko in #1249
Full Changelog: v1.18...v1.19
v1.18 - The Ice Age
What's Changed
New features
- MMS forced alignment backend by @flyingleafe in #1185
- Two new options:
CutSet.from_shar(seed="trng")
andDynamicCutSampler(quadratic_duration=...)
by @pzelasko in #1199 - Faster initialization option in
DynamicBucketingSampler
+ various fixes by @pzelasko in #1210 - CLI to estimate and print bucket bins for a cut set by @pzelasko in #1214
- More flexible setting of audio backends by @pzelasko in #1219
Recipes
- Add recipe for Medical Corpus by @yfyeung in #1212
- minor fix for the AMI recipe by @JinZr in #1178
- fixes compatibility with Edin. ver. VCTK dataset by @JinZr in #1182
- Minor bug fix for eval2000 recipe by @JinZr in #1127
- support far field data for icmcasr challenge by @yuekaizhang in #1189
- fixed text norm for
tal_csasr
by @JinZr in #1198 #1213
Other improvements
MixedCut.truncate
: fix the case when onlyPaddingCut
s are left by @flyingleafe in #1157- Fix some potential problems in OPUS file reading by @yangb05 in #1181
- fix an issue where 404 exception leaves 0 byte placeholder by @JinZr in #1190
- Prevent accidental renaming when using with_suffix by @chiiyeh in #1192
- Fix shar export for
num_jobs>1
and recordings with transforms by @pzelasko in #1196 - fix speaker error by @yzmyyff in #1197
- Fix for
trim_to_alignments
issue by @desh2608 in #1193 - Add
deterministic_rng
to more flaky tests by @pzelasko in #1200 - update_recipes by @vesis84 in #1208
- SpeechSynthesisDataset returns
speaker_ids
by @JinZr in #1206 - Fix audio backend selection by @pzelasko in #1216
- save sdm files into a single mdm file to do gss by @yuekaizhang in #1221
- Modify SpeechSynthesisDataset class, make it return text by @yaozengwei in #1205
- Allow lhotse installation without torchaudio for a limited set of features by @pzelasko in #1231
- Use
attacut
module for Thai word tokenization (in MMS forced alignment) by @flyingleafe in #1232
New Contributors
- @yangb05 made their first contribution in #1181
- @chiiyeh made their first contribution in #1192
- @yzmyyff made their first contribution in #1197
- @yaozengwei made their first contribution in #1205
Full Changelog: v1.17...v1.18
v1.17 - Swirling Ice Pick
What's Changed
New supported datasets
- Speech to text translation utilizing 3-way data by @AmirHussein96 in #1099
- "This American Life" dataset recipe by @flyingleafe in #1140
- Add VoxConverse recipe by @flyingleafe in #1142
- Add recipe for ICASSP2024 ICMC-ASR Grand Challenge by @yfyeung in #1172
New features
- Initial support for video by @pzelasko in #1151
copy_data
: copyCutSet
+ its data to a new location by @pzelasko in #1130- Add whisper feature extractor by @yuekaizhang in #1159
- VAD workflow with Silero by @rilshok in #1160
Enhancements and fixes
- Fix feature extraction for lhotse shar CLI by @pzelasko in #1123
- Add m4a to special cases for num samples determination by @pzelasko in #1124
- making the kaldi import more robust by @vesis84 in #1129
- Tutorial materials in main readme page by @pzelasko in #1133
- optimize save_audios() by @vesis84 in #1131
- Fix bugs in
resumable_download
by @flyingleafe in #1135 - Arxiv badge by @desh2608 in #1136
- Fix docs build by @pzelasko in #1137
- Fix failing tests after repairing docs build by @pzelasko in #1138
- Remove deprecated code, make minor cleanups by @pzelasko in #1139
- Enforce deterministic RNG behavior in repeatedly flaky tests by @pzelasko in #1143
- Refactor
audio.py
into smaller modules by @pzelasko in #1144 - Fix broken
save_audio
by @flyingleafe in #1147 - Optimize
cut_into_windows
for long cuts by @flyingleafe in #1150 - Fixes for #1152 #1153 and #1154 by @pzelasko in #1156
- fix bugs in downloading voxpopuli corpus by @DongjiGao in #1165
- Support
export_to_kaldi
on resampled recordings by @sih4sing5hong5 in #1162 - Refactor
CutSet.describe
to enable parallel statistics computation by @pzelasko in #1168 - Allow dashes in feat CLI by @desh2608 in #1169
- Apply deterministic RNG to more unit tests by @pzelasko in #1173
- Add
fix_manifests
in all recipes by @desh2608 in #1128 - Fix small bug in eval2000 by @desh2608 in #1126
- Fix download in LibriCSS recipe by @desh2608 in #1148
New Contributors
- @sih4sing5hong5 made their first contribution in #1162
- @rilshok made their first contribution in #1160
Full Changelog: v1.16...v1.17