Releases · awslabs/sockeye · GitHub

08 Nov 10:45

fhieber

1.10.5

[1.10.5]

Fixed

Fixed yet another bug with the data iterator.

[1.10.4]

Fixed

Fixed a bug with the revised data iterator not correctly appending EOS symbols for variable-length batches.
This reverts part of the commit added in 1.10.1 but is now correct again.

Assets 2

07 Nov 08:18

fhieber

1.10.3

[1.10.3]

Changed

Fixed a bug with max_observed_{source,target}_len being computed on the complete data set, not only on the
sentences actually added to the buckets based on --max_seq_len.

[1.10.2]

Added

--max-num-epochs flag to train for a maximum number of passes through the training data.

Assets 2

02 Nov 10:40

fhieber

Update to MXNet 0.12.0

[1.10.1]

Changed

Reduced memory footprint when creating data iterators: integer sequences
are streamed from disk when being assigned to buckets.

[1.10.0]

Changed

Updated MXNet dependency to 0.12 (w/ MKL support by default).
Changed --smoothed-cross-entropy-alpha to --label-smoothing.
Label smoothing should now require significantly less memory due to its addition to MXNet's SoftmaxOutput operator.
--weight-normalization now applies not only to convolutional weight matrices, but to output layers of all decoders.
It is also independent of weight tying.
Transformers now use --embed-dropout. Before they were using --transformer-dropout-prepost for this.
Transformers now scale their embedding vectors before adding fixed positional embeddings.
This turns out to be crucial for effective learning.
.param files now use 5 digit identifiers to reduce risk of overflowing with many checkpoints.

Added

Added CUDA 9.0 requirements file.
--loss-normalization-type. Added a new flag to control loss normalization. New default is to normalize
by the number of valid, non-PAD tokens instead of the batch size.
--weight-init-xavier-factor-type. Added new flag to control Xavier factor type when --weight-init=xavier.
--embed-weight-init. Added new flag for initialization of embeddings matrices.

Removed

--smoothed-cross-entropy-alpha argument. See above.
--normalize-loss argument. See above.

[1.9.0]

Added

Batch decoding. New options for the translate CLI: --batch-size and --chunk-size. Translator.translate()
now accepts and returns lists of inputs and outputs.

[1.8.4]

Added

Exposing the MXNet KVStore through the --kvstore argument, potentially enabling distributed training.

[1.8.3]

Added

Optional smart rollback of parameters and optimizer states after updating the learning rate
if not improved for x checkpoints. New flags: --learning-rate-decay-param-reset,
--learning-rate-decay-optimizer-states-reset

[1.8.2]

Fixed

The RNN variational dropout mask is now independent of the input
(previously any zero initial state led to the first state being canceled).
Correctly pass self.dropout_inputs float to mx.sym.Dropout in VariationalDropoutCell.

[1.8.1]

Changed

Instead of truncating sentences exceeding the maximum input length they are now translated in chunks.

Assets 2

10 Oct 08:11

fhieber

Conv2seq models

Added

Convolutional decoder.
Weight normalization (for CNN only so far).
Learned positional embeddings for the transformer.

Changed

--attention-* CLI params renamed to --rnn-attention-*.
--transformer-no-positional-encodings generalized to --transformer-positional-embedding-type.

Assets 2

10 Oct 08:11

fhieber

Updated Word batching

Word batching update: guarantee default bucket has largest batch size.
Comments/logic for clarity.
Address PR comments.

Memory usage note.
NamedTuple for bucket batch sizes.

Assets 2

10 Oct 08:13

fhieber

Transformer models

Added transformer models (Vaswasni et al, 2017) to Sockeye

Assets 2