Skip to content

Releases: lhotse-speech/lhotse

v1.24.2

25 Jun 15:59
Compare
Choose a tag to compare

New recipes

New features

Several new APIs for manifest classes added in #1361:

  • cut.iter_data() which iterates over (key, manifest) pairs of all data items attached to a given cut (e.g., ("recording", Recording(...)), ("custom_features", TemporalArray(...)))
  • is_in_memory property for all manifest types to indicate if it contains data that is held in memory
  • is_placeholder for non-cut manifests to indicate if a manifest is just a placeholder (has some metadata, but can't be used to load data)
  • cut.drop_in_memory_data() which converts manifests with in-memory data to placeholders (this is useful for manifests that live longer than just dataloading to avoid blowing up CPU memory and/or slowing down the program)

Bug fixes

  • Restoring smart open for local files if available by @pzelasko in #1360
  • Fix Recording.to_dict() when transforms are dicts and transform pickling issues by @pzelasko in #1355
  • Utils for discovering attached data and dropping in-memory data by @pzelasko in #1361
  • Numpy 2.0 compatibility by @pzelasko in #1362

New Contributors

Full Changelog: v1.24.1...v1.24.2

v1.24.1

10 Jun 20:35
866e4a8
Compare
Choose a tag to compare

What's Changed

  • Support for reading data from AIStore using Python SDK by @pzelasko in #1354

Full Changelog: v1.24...v1.24.1

v1.24 - The World's Highest Wingsuit Jump

05 Jun 19:59
4d57d53
Compare
Choose a tag to compare

What's Changed

New features

Notably, there's a new optimization for dynamic bucketing sampler in multi-GPU training - it will choose the same (or the closest possible) bucket on each DDP rank to keep the total training step times closer. The expected speedup is dependent on the model and the number of GPUs. We observed 8 and 13% speedups across two experiments compared to non-synchronized bucket selection. The new option is called sync_buckets and is enabled by default.

Recipes

Other improvements

New Contributors

Full Changelog: v1.23...v1.24

v1.23 - Snowdrop

30 Apr 18:43
b2dce78
Compare
Choose a tag to compare

What's Changed

Recipes

Fixes to a regression in noise mixing augmentations

  • Enhance CutSet.mix() randomness and data utilization by @pzelasko in #1315
  • Fix randomness in CutMix transform by @pzelasko in #1316
  • select a random sub-region of the noise based on the delta duration by @osadj in #1317

Other improvements

New Contributors

Full Changelog: v1.22...v1.23

v1.22 - Sherpa's Paradise

07 Mar 19:38
d26d476
Compare
Choose a tag to compare

What's Changed

New features

  • Extending Lhotse dataloading to text/multimodal data by @pzelasko in #1295

As an experimental feature, we are extending the API of Lhotse samplers to enable key sampling features for non-audio data such as text. That means text (and other) data can be dynamically multiplexed and bucketed in the same way as audio data with some lightweight wrappers. Please refer to new documentation here: https://lhotse.readthedocs.io/en/latest/datasets.html#customizing-sampling-constraints

  • Multi-channel support improvements
    • Fix loading multi-channel custom recording fields in multi cuts by @pzelasko in #1298
    • Channel selection for multi-channel custom recording fields by @pzelasko in #1299

Lhotse MultiCuts:

  • are now exportable into Lhotse Shar format
  • gained a new method cut = cut.with_channels([0, 1, ...]) to modify the channels they refer to
  • can have multi-channel custom Recordings with channels selectable via a special custom key (e.g., if defining cut.target_recording, audio can be read via cut.load_target_recording() and channels will be auto-selected by looking up cut.target_recording_channel_selector).

Recipes

Other improvements

New Contributors

Full Changelog: v1.21...v1.22

v1.21 - Glaciology

13 Feb 19:57
769c273
Compare
Choose a tag to compare

What's Changed

This release patches lhotse to handle cases when libsox is not available for torchaudio. The audio backend code went through additional round of refactoring, and libsndfile is now preferred as a default since it showed faster audio decoding performance in our testing. Going forward, when LHOTSE_AUDIO_BACKEND is set, we will use the same backend for audio loading, audio saving, and reading audio metadata (if possible). This release also adds support for Python 3.12 and PyTorch 2.2.

  • Add VAD to Supervisions in LibriLight Recipe by @yfyeung in #1280
  • Fixes for manifest validation and fixing by @pzelasko in #1284
  • Handle error with cachedir creation gracefully by @pzelasko in #1287
  • AudioBackend specific save_audio and info, managing missing SoX in torchaudio, Python 3.12 / PyTorch 2.2 support, using libsndfile as preferred audio backend by @pzelasko in #1288

Full Changelog: v1.20...v1.21

v1.20 - Pining for the Fjords

31 Jan 20:51
455b20e
Compare
Choose a tag to compare

What's Changed

New features

  • Extended the subset of lhotse that works without installing torchaudio by @pzelasko in #1253 #1255
  • Ensure drop_last=False always returns an equal number of mini-batches by re-distributing and/or duplicating some data by @pzelasko in #1277
  • Improved CPU memory usage and shuffling + bucketing in DynamicBucketingSampler by @pzelasko in #1276
  • Enable seed randomization in dynamic samplers by @pzelasko in #1278

Recipes

Other improvements

  • Update docs with env vars used by Lhotse by @pzelasko in #1252
  • support whisper large v3; deepspeed launcher rank world_size setting by @yuekaizhang in #1260
  • Fix non-deterministic tests by @pzelasko in #1261
  • Fix duplication issues in CutSet.mix() by @pzelasko in #1268
  • Support controllable CutSet.mux weights in multiprocess dataloading by @pzelasko in #1266
  • Fix distributed sampler initialization and exceeded sampler warning false positives by @pzelasko in #1270
  • Install kaldi-native-io explicitly in the kaldi doc example. by @csukuangfj in #1275
  • Allow duplicate cut IDs in a CutSet (CutSet is list-like instead of dict-like) by @pzelasko in #1279

New Contributors

Full Changelog: v1.19...v1.20

v1.19 - The Iceberger

02 Jan 14:58
3e53b68
Compare
Choose a tag to compare

What's Changed

Features

Recipes

Other improvements

Full Changelog: v1.18...v1.19

v1.18 - The Ice Age

11 Dec 14:10
78b3a12
Compare
Choose a tag to compare

What's Changed

New features

  • MMS forced alignment backend by @flyingleafe in #1185
  • Two new options: CutSet.from_shar(seed="trng") and DynamicCutSampler(quadratic_duration=...) by @pzelasko in #1199
  • Faster initialization option in DynamicBucketingSampler + various fixes by @pzelasko in #1210
  • CLI to estimate and print bucket bins for a cut set by @pzelasko in #1214
  • More flexible setting of audio backends by @pzelasko in #1219

Recipes

Other improvements

New Contributors

Full Changelog: v1.17...v1.18

v1.17 - Swirling Ice Pick

08 Oct 23:31
9c80a1e
Compare
Choose a tag to compare

What's Changed

New supported datasets

New features

Enhancements and fixes

New Contributors

Full Changelog: v1.16...v1.17