Remove RNN parameter packing, FusedRNN support; refactored core model…

… components (#189) * Removed RNN parameter packing and FusedRNN support * Refactor embedding and output layers (#196) * Removed RNN parameter packing and FusedRNN support * Refactoring of sockeye model: source embed/target embed/output layers are now separate components in model * Make training and inference work. Remove lexical biasing code.
awslabs · Nov 21, 2017 · cc24739 · cc24739
1 parent 2446dd1
commit cc24739
Show file tree

Hide file tree

Showing 17 changed files with 497 additions and 680 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -1,19 +1,28 @@
 # Changelog
-All notable changes to this project will be documented in this file.
+All notable changes to the project are documented in this file.
 
-We use version numbers with three digits such as 1.0.0.
+Version numbers are of the form `1.0.0`.
 Any version bump in the last digit is backwards-compatible, in that a model trained with the previous version can still
 be used for translation with the new version.
-Any bump in the second digit indicates potential backwards incompatibilities, e.g. due to changing the architecture or
-simply modifying weight names.
+Any bump in the second digit indicates a backwards-incompatible change,
+e.g. due to changing the architecture or simply modifying model parameter names.
 Note that Sockeye has checks in place to not translate with an old model that was trained with an incompatible version.
 
-For each item we will potentially have subsections for: _Added_, _Changed_, _Removed_, _Deprecated_, and _Fixed_.
+Each version section may have have subsections for: _Added_, _Changed_, _Removed_, _Deprecated_, and _Fixed_.
+
 
 ## [1.13.0]
 ### Fixed
+ - Transformer models do not ignore `--num-embed` anymore as they did silently before.
+ As a result there is an error thrown if `--num-embed` != `--transformer-model-size`.
  - Fixed the attention in upper layers (`--rnn-attention-in-upper-layers`), which was previously not passed correctly
-   to the decoder.
+ to the decoder.
+### Removed
+ - Removed RNN parameter (un-)packing and support for FusedRNNCells (removed `--use-fused-rnns` flag).
+ These were not used, not correctly initialized, and performed worse than regular RNN cells. Moreover,
+ they made the code much more complex. RNN models trained with previous versions are no longer compatible. 
+- Removed the lexical biasing functionality (Arthur ETAL'16) (removed arguments `--lexical-bias`
+ and `--learn-lexical-bias`).
 
 ## [1.12.2]
 ### Changed
@@ -120,7 +129,7 @@ For each item we will potentially have subsections for: _Added_, _Changed_, _Rem
  - Convolutional decoder.
  - Weight normalization (for CNN only so far).
  - Learned positional embeddings for the transformer.
-
+ 
 ### Changed
  - `--attention-*` CLI params renamed to `--rnn-attention-*`.
  - `--transformer-no-positional-encodings` generalized to `--transformer-positional-embedding-type`.

diff --git a/sockeye/arguments.py b/sockeye/arguments.py
@@ -489,15 +489,6 @@ def add_model_parameters(params):
                               type=int, default=None,
                               help='Number of heads for Multi-head dot attention. Default: %(default)s.')
 
-    model_params.add_argument('--lexical-bias',
-                              default=None,
-                              type=str,
-                              help="Specify probabilistic lexicon (fast_align format) for lexical biasing (Arthur "
-                                   "ETAL'16). Set smoothing value epsilon by appending :<eps>")
-    model_params.add_argument('--learn-lexical-bias',
-                              action='store_true',
-                              help='Adjust lexicon probabilities during training. Default: %(default)s')
-
     model_params.add_argument('--weight-tying',
                               action='store_true',
                               help='Turn on weight tying (see arxiv.org/abs/1608.05859). '
@@ -690,7 +681,8 @@ def add_training_args(params):
                               default=C.EMBED_INIT_DEFAULT,
                               choices=C.EMBED_INIT_TYPES,
                               help='Type of embedding matrix weight initialization. If normal, initializes embedding '
-                                   'weights using a normal distribution with std=vocab_size. Default: %(default)s.')
+                                   'weights using a normal distribution with std=1/srqt(vocab_size). '
+                                   'Default: %(default)s.')
     train_params.add_argument('--initial-learning-rate',
                               type=float,
                               default=0.0003,
@@ -750,11 +742,6 @@ def add_training_args(params):
                                    "reduced due to the value of --learning-rate-reduce-num-not-improved. "
                                    "Default: %(default)s.")
 
-    train_params.add_argument('--use-fused-rnn',
-                              default=False,
-                              action="store_true",
-                              help='Use FusedRNNCell in encoder (requires GPU device). Speeds up training.')
-
     train_params.add_argument('--rnn-forget-bias',
                               default=0.0,
                               type=float,

diff --git a/sockeye/constants.py b/sockeye/constants.py
@@ -40,6 +40,7 @@
 TRANSFORMER_ENCODER_PREFIX = ENCODER_PREFIX + "transformer_"
 CNN_ENCODER_PREFIX = ENCODER_PREFIX + "cnn_"
 CHAR_SEQ_ENCODER_PREFIX = ENCODER_PREFIX + "char_"
+DEFAULT_OUTPUT_LAYER_PREFIX = "target_output_"
 
 # embedding prefixes
 SOURCE_EMBEDDING_PREFIX = "source_embed_"