Update to MXNet 0.12.0

fhieber released this 02 Nov 10:40

[1.10.1]

Changed

Reduced memory footprint when creating data iterators: integer sequences
are streamed from disk when being assigned to buckets.

[1.10.0]

Changed

Updated MXNet dependency to 0.12 (w/ MKL support by default).
Changed --smoothed-cross-entropy-alpha to --label-smoothing.
Label smoothing should now require significantly less memory due to its addition to MXNet's SoftmaxOutput operator.
--weight-normalization now applies not only to convolutional weight matrices, but to output layers of all decoders.
It is also independent of weight tying.
Transformers now use --embed-dropout. Before they were using --transformer-dropout-prepost for this.
Transformers now scale their embedding vectors before adding fixed positional embeddings.
This turns out to be crucial for effective learning.
.param files now use 5 digit identifiers to reduce risk of overflowing with many checkpoints.

Added

Added CUDA 9.0 requirements file.
--loss-normalization-type. Added a new flag to control loss normalization. New default is to normalize
by the number of valid, non-PAD tokens instead of the batch size.
--weight-init-xavier-factor-type. Added new flag to control Xavier factor type when --weight-init=xavier.
--embed-weight-init. Added new flag for initialization of embeddings matrices.

Removed

--smoothed-cross-entropy-alpha argument. See above.
--normalize-loss argument. See above.

[1.9.0]

Added

Batch decoding. New options for the translate CLI: --batch-size and --chunk-size. Translator.translate()
now accepts and returns lists of inputs and outputs.

[1.8.4]

Added

Exposing the MXNet KVStore through the --kvstore argument, potentially enabling distributed training.

[1.8.3]

Added

Optional smart rollback of parameters and optimizer states after updating the learning rate
if not improved for x checkpoints. New flags: --learning-rate-decay-param-reset,
--learning-rate-decay-optimizer-states-reset

[1.8.2]

Fixed

The RNN variational dropout mask is now independent of the input
(previously any zero initial state led to the first state being canceled).
Correctly pass self.dropout_inputs float to mx.sym.Dropout in VariationalDropoutCell.

[1.8.1]

Changed

Instead of truncating sentences exceeding the maximum input length they are now translated in chunks.

Assets 2