Skip to content

2.1.7

Compare
Choose a tag to compare
@fhieber fhieber released this 03 Jun 09:40
88dc440

[2.1.7]

Changed

  • Optimize prepare_data by saving the shards in parallel. The prepare_data script accepts a new parameter --max-processes to control the level of parallelism with which shards are written to disk.

[2.1.6]

Changed

  • Updated Dockerfiles optimized for CPU (intgemm int8 inference, full MKL support) and GPU (distributed training with Horovod). See sockeye_contrib/docker.

Added

  • Official support for int8 quantization with intgemm:
    • This requires the "intgemm" fork of MXNet (kpuatamazon/incubator-mxnet/intgemm). This is the version of MXNet used in the Sockeye CPU docker image (see sockeye_contrib/docker).
    • Use sockeye.translate --dtype int8 to quantize a trained float32 model at runtime.
    • Use the sockeye.quantize CLI to annotate a float32 model with int8 scaling factors for fast runtime quantization.

[2.1.5]

Changed

  • Changed state caching for transformer models during beam search to cache states with attention heads already separated out. This avoids repeated transpose operations during decoding, leading to faster inference.

[2.1.4]

Added

[2.1.3]

Changed

  • Performance optimizations to beam search inference
    • Remove unneeded take ops on encoder states
    • Gathering input data before sending to GPU, rather than sending each batch element individually
    • All of beam search can be done in fp16, if specified by the model
    • Other small miscellaneous optimizations
  • Model states are now a flat list in ensemble inference, structure of states provided by state_structure()

[2.1.2]

Changed

Added

  • Added support for CUDA 10.2

Removed

  • Removed support for CUDA<9.1 / CUDNN<7.5

[2.1.1]

Added

  • Ability to set environment variables from training/translate CLIs before MXNet is imported. For example, users can
    configure MXNet as such: --env "OMP_NUM_THREADS=1;MXNET_ENGINE_TYPE=NaiveEngine"

[2.1.0]

Changed

  • Version bump, which should have been included in commit b0461b due to incompatible models.

[2.0.1]

Changed

  • Inference defaults to using the max input length observed in training (versus scaling down based on mean length ratio and standard deviations).

Added

  • Additional parameter fixing strategies:
    • all_except_feed_forward: Only train feed forward layers.
    • encoder_and_source_embeddings: Only train the decoder (decoder layers, output layer, and target embeddings).
    • encoder_half_and_source_embeddings: Train the latter half of encoder layers and the decoder.
  • Option to specify the number of CPU threads without using an environment variable (--omp-num-threads).
  • More flexibility for source factors combination

[2.0.0]

Changed

  • Update to MXNet 1.5.0
  • Moved SockeyeModel implementation and all layers to Gluon API
  • Removed support for Python 3.4.
  • Removed image captioning module
  • Removed outdated Autopilot module
  • Removed unused training options: Eve, Nadam, RMSProp, Nag, Adagrad, and Adadelta optimizers, fixed-step and fixed-rate-inv-t learning rate schedulers
  • Updated and renamed learning rate scheduler fixed-rate-inv-sqrt-t -> inv-sqrt-decay
  • Added script for plotting metrics files: sockeye_contrib/plot_metrics.py
  • Removed option --weight-tying. Weight tying is enabled by default, disable with --weight-tying-type none.

Added

  • Added distributed training support with Horovod/OpenMPI. Use horovodrun and the --horovod training flag.
  • Added Dockerfiles that build a Sockeye image with all features enabled. See sockeye_contrib/docker.
  • Added none learning rate scheduler (use a fixed rate throughout training)
  • Added linear-decay learning rate scheduler
  • Added training option --learning-rate-t-scale for time-based decay schedulers
  • Added support for MXNet's Automatic Mixed Precision. Activate with the --amp training flag. For best results, make sure as many model dimensions are possible are multiples of 8.
  • Added options for making various model dimensions multiples of a given value. For example, use --pad-vocab-to-multiple-of 8, --bucket-width 8 --no-bucket-scaling, and --round-batch-sizes-to-multiple-of 8 with AMP training.
  • Added GluonNLP's BERTAdam optimizer, an implementation of the Adam variant used by Devlin et al. (2018). Use --optimizer bertadam.
  • Added training option --checkpoint-improvement-threshold to set the amount of metric improvement required over the window of previous checkpoints to be considered actual model improvement (used with --max-num-checkpoint-not-improved).