Releases: awslabs/sockeye
Releases · awslabs/sockeye
1.18.23
1.18.22
Fixed
- Make sure the default bucket is large enough with word based batching when the source is longer than the target (Previously
there was an edge case where the memory usage was sub-optimal with word based batching and longer source than target sentences).
1.18.21
1.18.20
[1.18.20]
Changed
- Transformer parametrization flags (model size, # of attention heads, feed-forward layer size) can now optionally
defined separately for encoder & decoder. For example, to use a different transformer model size for the encoder,
pass--transformer-model-size 1024:512
.
[1.18.19]
Added
- LHUC is now supported in transformer models
[1.18.18]
Added
- [Experimental] Introducing the image captioning module. Type of models supported: ConvNet encoder - Sockeye NMT decoders. This includes also a feature extraction script,
an image-text iterator that loads features, training and inference pipelines and a visualization script that loads images and captions.
See this tutorial for its usage. This module is experimental therefore its maintenance is not fully guaranteed.
1.18.17
[1.18.17]
Changed
- Updated to MXNet 1.2
- Use of the new LayerNormalization operator to save GPU memory.
[1.18.16]
Fixed
- Removed summation of gradient arrays when logging gradients.
This clogged the memory on the primary GPU device over time when many checkpoints were done.
Gradient histograms are now logged to Tensorboard separated by device.
1.18.15
[1.18.15]
Added
- Added decoding with target-side lexical constraints (documentation in
tutorials/constraints
).
[1.18.14]
Added
- Introduced Sockeye Autopilot for single-command end-to-end system building.
See the Autopilot documentation and run with:sockeye-autopilot
.
Autopilot is acontrib
module with its own tests that are run periodically.
It is not included in the comprehensive tests run for every commit.
1.18.13
[1.18.13]
Fixed
- Fixed two bugs with training resumption:
- removed overly strict assertion in the data iterator for model states before the first checkpoint.
- removed deletion of Tensorboard log directory.
Added
- Added support for config files. Command line parameters have precedence over the values read from the config file.
Minimal working example:
python -m sockeye.train --config config.yaml
with contents ofconfig.yaml
as follows:source: source.txt target: target.txt output: out validation_source: valid.source.txt validation_target: valid.target.txt
Changed
The full set of arguments is serialized to out/args.yaml
at the beginning of training (before json was used).
[1.18.12]
Changed
- All source side sequences now get appended an additional end-of-sentence (EOS) symbol. This change is backwards
compatible meaning that inference with older models will still work without the EOS symbol.
[1.18.11]
Changed
- Default training parameters have been changed to reflect the setup used in our arXiv paper. Specifically, the default
is now to train a 6 layer Transformer model with word based batching. The only difference to the paper is that weight
tying is still turned off by default, as there may be use cases in which tying the source and target vocabularies is
not appropriate. Turn it on using--weight-tying --weight-tying-type=src_trg_softmax
. Additionally, BLEU scores from
a checkpoint decoder are now monitored by default.
1.18.10
1.18.9
[1.18.9]
Fixed
- Fixed a problem with lhuc boolean flags passed as None.
Added
- Reorganized beam search. Normalization is applied only to completed hypotheses, and pruning of
hypotheses (logprob against highest-scoring completed hypothesis) can be specified with
--beam-prune X
- Enabled stopping at first completed hypothesis with
--beam-search-stop first
(default is 'all')
1.18.8
[1.18.8]
Removed
- Removed tensorboard logging of embedding & output parameters at every checkpoint. This used a lot of disk space.
[1.18.7]
Added
- Added support for LHUC in RNN models (David Vilar, "Learning Hidden Unit
Contribution for Adapting Neural Machine Translation Models" NAACL 2018)
Fixed
- Word based batching with very small batch sizes.