You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Run in FP16 mode with --deepspeed-fp16 or BF16 mode with --deepspeed-bf16.
[3.1.19]
Added
Clean up GPU and CPU memory used during training initialization before starting the main training loop.
Changed
Refactored training code in advance of adding DeepSpeed support:
Moved logic for flagging interleaved key-value parameters from layers.py to model.py.
Refactored LearningRateScheduler API to be compatible with PyTorch/DeepSpeed.
Refactored optimizer and learning rate scheduler creation to be modular.
Migrated to ModelWithLoss API, which wraps a Sockeye model and its losses in a single module.
Refactored primary and secondary worker logic to reduce redundant calculations.
Refactored code for saving/loading training states.
Added utility code for managing model/training configurations.
Removed
Removed unused training option --learning-rate-t-scale.
[3.1.18]
Added
Added sockeye-train and sockeye-translate option --clamp-to-dtype that clamps outputs of transformer attention, feed-forward networks, and process blocks to the min/max finite values for the current dtype. This can prevent inf/nan values from overflow when running large models in float16 mode. See: https://discuss.huggingface.co/t/t5-fp16-issue-is-fixed/3139
[3.1.17]
Added
Added support for offline model quantization with sockeye-quantize.
Pre-quantizing a model avoids the load-time memory spike of runtime quantization. For example, a float16 model loads directly as float16 instead of loading as float32 then casting to float16.
[3.1.16]
Added
Added nbest list reranking options using isometric translation criteria as proposed in an ICASSP 2021 paper https://arxiv.org/abs/2110.03847.
To use this feature pass a criterion (isometric-ratio, isometric-diff, isometric-lc) when specifying --metric.
Added --output-best-non-blank to output non-blank best hypothesis from the nbest list.
[3.1.15]
Fixed
Fix type of valid_length to be pt.Tensor instead of Optional[pt.Tensor] = None for jit tracing