Releases: awslabs/sockeye
Releases · awslabs/sockeye
3.1.34
3.1.33
[3.1.33]
Fixed
- Two small fixes to SampleK. Before the device was not set correctly leading to issues when running sampling on GPUs. Furthermore, SampleK did not return the top-k values correctly.
[3.1.32]
Added
- Sockeye now supports blocking cross-attention between decoder and encoded prepended tokens.
- If the source contains prepended text and a tag indicating the end of prepended text,
Sockeye supports blocking the cross-attention between decoder and encoded prepended tokens (including the tag).
To enable this operation, specify--end-of-prepending-tag
for training or data preparation,
and--transformer-block-prepended-cross-attention
for training.
- If the source contains prepended text and a tag indicating the end of prepended text,
Changed
- Sockeye uses a new dictionary-based prepared data format that supports storing length of prepended source tokens
(version 7). The previous format (version 6) is still supported.
3.1.31
[3.1.31]
Fixed
- Fixed sequence copying integration tests to correctly specify that scoring/translation outputs should not be checked.
- Enabled
bfloat16
integration and system testing on all platforms.
[3.1.30]
Added
- Added support for
--dtype bfloat16
tosockeye-translate
,sockeye-score
, andsockeye-quantize
.
Fixed
- Fixed compatibility issue with
numpy==1.24.0
by usingpickle
instead ofnumpy
to save/loadParallelSampleIter
data permutations.
3.1.29
[3.1.29]
Changed
- Running
sockeye-evaluate
no longer applies text tokenization for TER (same behavior as other metrics). - Turned on type checking for all
sockeye
modules excepttest_utils
and addressed resulting type issues. - Refactored code in various modules without changing user-level behavior.
[3.1.28]
Added
- Added kNN-MT model from Khandelwal et al., 2021.
- Installation: see faiss document -- installation via conda is recommended.
- Building a faiss index from a sockeye model takes two steps:
- Generate decoder states:
sockeye-generate-decoder-states -m [model] --source [src] --target [tgt] --output-dir [output dir]
- Build index:
sockeye-knn -i [input_dir] -o [output_dir] -t [faiss_index_signature]
whereinput_dir
is the same asoutput_dir
from thesockeye-generate-decoder-states
command. - Faiss index signature reference: see here
- Generate decoder states:
- Running inference using the built index:
sockeye-translate ... --knn-index [index_dir] --knn-lambda [interpolation_weight]
whereindex_dir
is the same asoutput_dir
from thesockeye-knn
command.
3.1.27
[3.1.27]
Changed
- allow torch 1.13 in requirements.txt
- Replaced deprecated
torch.testing.assert_allclose
withtorch.testing.close
for PyTorch 1.14 compatibility.
[3.1.26]
Added
--tf32 0|1
bool device (torch.backends.cuda.matmul.allow_tf32
)
enabling 10-bit precision (19 bit total) transparent float32
acceleration. default true for backward compat with torch < 1.12.
allow different--tf32
training continuation
Changed
device.init_device()
called by train, translate, and score- allow torch 1.12 in requirements.txt
[3.1.25]
Changed
- Updated to sacrebleu==2.3.1. Changed default BLEU floor smoothing offset from 0.01 to 0.1.
[3.1.24]
Fixed
- Updated DeepSpeed checkpoint conversion to support newer versions of DeepSpeed.
[3.1.23]
Changed
- Change decoder softmax size logging level from info to debug.
[3.1.22]
Added
- log beam search avg output vocab size
Changed
- common base Search for GreedySearch and BeamSearch
- .pylintrc: suppress warnings about deprecated pylint warning suppressions
[3.1.21]
Fixed
- Send skip_nvs and nvs_thresh args now to Translator constructor in sockeye-translate instead of ignoring them.
[3.1.20]
Added
- Added training support for DeepSpeed.
- Installation:
pip install deepspeed
- Usage:
deepspeed --no_python ... sockeye-train ...
- DeepSpeed mode uses Zero Redundancy Optimizer (ZeRO) stage 1 (Rajbhandari et al., 2019).
- Run in FP16 mode with
--deepspeed-fp16
or BF16 mode with--deepspeed-bf16
.
- Installation:
[3.1.19]
Added
- Clean up GPU and CPU memory used during training initialization before starting the main training loop.
Changed
- Refactored training code in advance of adding DeepSpeed support:
- Moved logic for flagging interleaved key-value parameters from layers.py to model.py.
- Refactored LearningRateScheduler API to be compatible with PyTorch/DeepSpeed.
- Refactored optimizer and learning rate scheduler creation to be modular.
- Migrated to ModelWithLoss API, which wraps a Sockeye model and its losses in a single module.
- Refactored primary and secondary worker logic to reduce redundant calculations.
- Refactored code for saving/loading training states.
- Added utility code for managing model/training configurations.
Removed
- Removed unused training option
--learning-rate-t-scale
.
[3.1.18]
Added
- Added
sockeye-train
andsockeye-translate
option--clamp-to-dtype
that clamps outputs of transformer attention, feed-forward networks, and process blocks to the min/max finite values for the current dtype. This can prevent inf/nan values from overflow when running large models in float16 mode. See: https://discuss.huggingface.co/t/t5-fp16-issue-is-fixed/3139
[3.1.17]
Added
- Added support for offline model quantization with
sockeye-quantize
.- Pre-quantizing a model avoids the load-time memory spike of runtime quantization. For example, a float16 model loads directly as float16 instead of loading as float32 then casting to float16.
[3.1.16]
Added
- Added nbest list reranking options using isometric translation criteria as proposed in an ICASSP 2021 paper https://arxiv.org/abs/2110.03847.
To use this feature pass a criterion (isometric-ratio, isometric-diff, isometric-lc
) when specifying--metric
. - Added
--output-best-non-blank
to output non-blank best hypothesis from the nbest list.
[3.1.15]
Fixed
- Fix type of valid_length to be pt.Tensor instead of Optional[pt.Tensor] = None for jit tracing
3.1.14
[3.1.14]
Added
- Added the implementation of Neural vocabulary selection to Sockeye as presented in our NAACL 2022 paper "The Devil is in the Details: On the Pitfalls of Vocabulary Selection in Neural Machine Translation" (Tobias Domhan, Eva Hasler, Ke Tran, Sony Trenous, Bill Byrne and Felix Hieber).
- To use NVS simply specify
--neural-vocab-selection
tosockeye-train
. This will train a model with Neural Vocabulary Selection that is automatically used bysockeye-translate
. If you want look at translations without vocabulary selection specify--skip-nvs
as an argument tosockeye-translate
.
- To use NVS simply specify
[3.1.13]
Added
- Added
sockeye-train
argument--no-reload-on-learning-rate-reduce
that disables reloading the best training checkpoint when reducing the learning rate. This currently only applies to theplateau-reduce
learning rate scheduler since other schedulers do not reload checkpoints.
3.1.12
[3.1.12]
Fixed
- Fix scoring with batches of size 1 (whic may occur when
|data| % batch_size == 1
.
[3.1.11]
Fixed
- When resuming training with a fully trained model,
sockeye-train
will correctly exit without creating a duplicate (but separately numbered) checkpoint.
3.1.10
3.1.9
[3.1.9]
Changed
- Clarified usage of
batch_size
in Translator code.
[3.1.8]
Fixed
- When saving parameters, SockeyeModel now skips parameters for traced modules because these modules are created at runtime and use the same parameters as non-traced versions. When loading parameters, SockeyeModel ignores parameters for traced modules that may have been saved by earlier versions.
3.1.7
[3.1.7]
Changed
- SockeyeModel components are now traced regardless of whether
inference_only
is set, including for the CheckpointDecoder during training.
[3.1.6]
Changed
- Moved offsetting of topk scores out of the (traced) TopK module. This allows sending requests of variable
batch size to the same Translator/Model/BeamSearch instance.
[3.1.5]
Changed
- Allow PyTorch 1.11 in requirements