You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This commit was created on GitHub.com and signed with GitHub’s verified signature.
The key has expired.
[1.10.1]
Changed
Reduced memory footprint when creating data iterators: integer sequences
are streamed from disk when being assigned to buckets.
[1.10.0]
Changed
Updated MXNet dependency to 0.12 (w/ MKL support by default).
Changed --smoothed-cross-entropy-alpha to --label-smoothing.
Label smoothing should now require significantly less memory due to its addition to MXNet's SoftmaxOutput operator.
--weight-normalization now applies not only to convolutional weight matrices, but to output layers of all decoders.
It is also independent of weight tying.
Transformers now use --embed-dropout. Before they were using --transformer-dropout-prepost for this.
Transformers now scale their embedding vectors before adding fixed positional embeddings.
This turns out to be crucial for effective learning.
.param files now use 5 digit identifiers to reduce risk of overflowing with many checkpoints.
Added
Added CUDA 9.0 requirements file.
--loss-normalization-type. Added a new flag to control loss normalization. New default is to normalize
by the number of valid, non-PAD tokens instead of the batch size.
--weight-init-xavier-factor-type. Added new flag to control Xavier factor type when --weight-init=xavier.
--embed-weight-init. Added new flag for initialization of embeddings matrices.
Removed
--smoothed-cross-entropy-alpha argument. See above.
--normalize-loss argument. See above.
[1.9.0]
Added
Batch decoding. New options for the translate CLI: --batch-size and --chunk-size. Translator.translate()
now accepts and returns lists of inputs and outputs.
[1.8.4]
Added
Exposing the MXNet KVStore through the --kvstore argument, potentially enabling distributed training.
[1.8.3]
Added
Optional smart rollback of parameters and optimizer states after updating the learning rate
if not improved for x checkpoints. New flags: --learning-rate-decay-param-reset, --learning-rate-decay-optimizer-states-reset
[1.8.2]
Fixed
The RNN variational dropout mask is now independent of the input
(previously any zero initial state led to the first state being canceled).
Correctly pass self.dropout_inputs float to mx.sym.Dropout in VariationalDropoutCell.
[1.8.1]
Changed
Instead of truncating sentences exceeding the maximum input length they are now translated in chunks.