Skip to content

3.1.27

Compare
Choose a tag to compare
@fhieber fhieber released this 06 Nov 14:13
· 15 commits to main since this release
288baa7

[3.1.27]

Changed

  • allow torch 1.13 in requirements.txt
  • Replaced deprecated torch.testing.assert_allclose with torch.testing.close for PyTorch 1.14 compatibility.

[3.1.26]

Added

  • --tf32 0|1 bool device (torch.backends.cuda.matmul.allow_tf32)
    enabling 10-bit precision (19 bit total) transparent float32
    acceleration. default true for backward compat with torch < 1.12.
    allow different --tf32 training continuation

Changed

  • device.init_device() called by train, translate, and score
  • allow torch 1.12 in requirements.txt

[3.1.25]

Changed

  • Updated to sacrebleu==2.3.1. Changed default BLEU floor smoothing offset from 0.01 to 0.1.

[3.1.24]

Fixed

  • Updated DeepSpeed checkpoint conversion to support newer versions of DeepSpeed.

[3.1.23]

Changed

  • Change decoder softmax size logging level from info to debug.

[3.1.22]

Added

  • log beam search avg output vocab size

Changed

  • common base Search for GreedySearch and BeamSearch
  • .pylintrc: suppress warnings about deprecated pylint warning suppressions

[3.1.21]

Fixed

  • Send skip_nvs and nvs_thresh args now to Translator constructor in sockeye-translate instead of ignoring them.

[3.1.20]

Added

  • Added training support for DeepSpeed.
    • Installation: pip install deepspeed
    • Usage: deepspeed --no_python ... sockeye-train ...
    • DeepSpeed mode uses Zero Redundancy Optimizer (ZeRO) stage 1 (Rajbhandari et al., 2019).
    • Run in FP16 mode with --deepspeed-fp16 or BF16 mode with --deepspeed-bf16.

[3.1.19]

Added

  • Clean up GPU and CPU memory used during training initialization before starting the main training loop.

Changed

  • Refactored training code in advance of adding DeepSpeed support:
    • Moved logic for flagging interleaved key-value parameters from layers.py to model.py.
    • Refactored LearningRateScheduler API to be compatible with PyTorch/DeepSpeed.
    • Refactored optimizer and learning rate scheduler creation to be modular.
    • Migrated to ModelWithLoss API, which wraps a Sockeye model and its losses in a single module.
    • Refactored primary and secondary worker logic to reduce redundant calculations.
    • Refactored code for saving/loading training states.
    • Added utility code for managing model/training configurations.

Removed

  • Removed unused training option --learning-rate-t-scale.

[3.1.18]

Added

  • Added sockeye-train and sockeye-translate option --clamp-to-dtype that clamps outputs of transformer attention, feed-forward networks, and process blocks to the min/max finite values for the current dtype. This can prevent inf/nan values from overflow when running large models in float16 mode. See: https://discuss.huggingface.co/t/t5-fp16-issue-is-fixed/3139

[3.1.17]

Added

  • Added support for offline model quantization with sockeye-quantize.
    • Pre-quantizing a model avoids the load-time memory spike of runtime quantization. For example, a float16 model loads directly as float16 instead of loading as float32 then casting to float16.

[3.1.16]

Added

  • Added nbest list reranking options using isometric translation criteria as proposed in an ICASSP 2021 paper https://arxiv.org/abs/2110.03847.
    To use this feature pass a criterion (isometric-ratio, isometric-diff, isometric-lc) when specifying --metric.
  • Added --output-best-non-blank to output non-blank best hypothesis from the nbest list.

[3.1.15]

Fixed

  • Fix type of valid_length to be pt.Tensor instead of Optional[pt.Tensor] = None for jit tracing