You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This commit was created on GitHub.com and signed with GitHub’s verified signature.
The key has expired.
[2.1.7]
Changed
Optimize prepare_data by saving the shards in parallel. The prepare_data script accepts a new parameter --max-processes to control the level of parallelism with which shards are written to disk.
[2.1.6]
Changed
Updated Dockerfiles optimized for CPU (intgemm int8 inference, full MKL support) and GPU (distributed training with Horovod). See sockeye_contrib/docker.
Added
Official support for int8 quantization with intgemm:
Use sockeye.translate --dtype int8 to quantize a trained float32 model at runtime.
Use the sockeye.quantize CLI to annotate a float32 model with int8 scaling factors for fast runtime quantization.
[2.1.5]
Changed
Changed state caching for transformer models during beam search to cache states with attention heads already separated out. This avoids repeated transpose operations during decoding, leading to faster inference.
[2.1.4]
Added
Added Dockerfiles that build an experimental CPU-optimized Sockeye image:
Ability to set environment variables from training/translate CLIs before MXNet is imported. For example, users can
configure MXNet as such: --env "OMP_NUM_THREADS=1;MXNET_ENGINE_TYPE=NaiveEngine"
[2.1.0]
Changed
Version bump, which should have been included in commit b0461b due to incompatible models.
[2.0.1]
Changed
Inference defaults to using the max input length observed in training (versus scaling down based on mean length ratio and standard deviations).
Added
Additional parameter fixing strategies:
all_except_feed_forward: Only train feed forward layers.
encoder_and_source_embeddings: Only train the decoder (decoder layers, output layer, and target embeddings).
encoder_half_and_source_embeddings: Train the latter half of encoder layers and the decoder.
Option to specify the number of CPU threads without using an environment variable (--omp-num-threads).
Removed option --weight-tying. Weight tying is enabled by default, disable with --weight-tying-type none.
Added
Added distributed training support with Horovod/OpenMPI. Use horovodrun and the --horovod training flag.
Added Dockerfiles that build a Sockeye image with all features enabled. See sockeye_contrib/docker.
Added none learning rate scheduler (use a fixed rate throughout training)
Added linear-decay learning rate scheduler
Added training option --learning-rate-t-scale for time-based decay schedulers
Added support for MXNet's Automatic Mixed Precision. Activate with the --amp training flag. For best results, make sure as many model dimensions are possible are multiples of 8.
Added options for making various model dimensions multiples of a given value. For example, use --pad-vocab-to-multiple-of 8, --bucket-width 8 --no-bucket-scaling, and --round-batch-sizes-to-multiple-of 8 with AMP training.
Added GluonNLP's BERTAdam optimizer, an implementation of the Adam variant used by Devlin et al. (2018). Use --optimizer bertadam.
Added training option --checkpoint-improvement-threshold to set the amount of metric improvement required over the window of previous checkpoints to be considered actual model improvement (used with --max-num-checkpoint-not-improved).