Skip to content

Commit

Permalink
Remove RNN parameter packing, FusedRNN support; refactored core model…
Browse files Browse the repository at this point in the history
… components (#189)

* Removed RNN parameter packing and FusedRNN support

* Refactor embedding and output layers (#196)

* Removed RNN parameter packing and FusedRNN support

* Refactoring of sockeye model: source embed/target embed/output layers are now separate components in model

* Make training and inference work. Remove lexical biasing code.
  • Loading branch information
fhieber authored Nov 21, 2017
1 parent 2446dd1 commit cc24739
Show file tree
Hide file tree
Showing 17 changed files with 497 additions and 680 deletions.
23 changes: 16 additions & 7 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,19 +1,28 @@
# Changelog
All notable changes to this project will be documented in this file.
All notable changes to the project are documented in this file.

We use version numbers with three digits such as 1.0.0.
Version numbers are of the form `1.0.0`.
Any version bump in the last digit is backwards-compatible, in that a model trained with the previous version can still
be used for translation with the new version.
Any bump in the second digit indicates potential backwards incompatibilities, e.g. due to changing the architecture or
simply modifying weight names.
Any bump in the second digit indicates a backwards-incompatible change,
e.g. due to changing the architecture or simply modifying model parameter names.
Note that Sockeye has checks in place to not translate with an old model that was trained with an incompatible version.

For each item we will potentially have subsections for: _Added_, _Changed_, _Removed_, _Deprecated_, and _Fixed_.
Each version section may have have subsections for: _Added_, _Changed_, _Removed_, _Deprecated_, and _Fixed_.


## [1.13.0]
### Fixed
- Transformer models do not ignore `--num-embed` anymore as they did silently before.
As a result there is an error thrown if `--num-embed` != `--transformer-model-size`.
- Fixed the attention in upper layers (`--rnn-attention-in-upper-layers`), which was previously not passed correctly
to the decoder.
to the decoder.
### Removed
- Removed RNN parameter (un-)packing and support for FusedRNNCells (removed `--use-fused-rnns` flag).
These were not used, not correctly initialized, and performed worse than regular RNN cells. Moreover,
they made the code much more complex. RNN models trained with previous versions are no longer compatible.
- Removed the lexical biasing functionality (Arthur ETAL'16) (removed arguments `--lexical-bias`
and `--learn-lexical-bias`).

## [1.12.2]
### Changed
Expand Down Expand Up @@ -120,7 +129,7 @@ For each item we will potentially have subsections for: _Added_, _Changed_, _Rem
- Convolutional decoder.
- Weight normalization (for CNN only so far).
- Learned positional embeddings for the transformer.

### Changed
- `--attention-*` CLI params renamed to `--rnn-attention-*`.
- `--transformer-no-positional-encodings` generalized to `--transformer-positional-embedding-type`.
Expand Down
17 changes: 2 additions & 15 deletions sockeye/arguments.py
Original file line number Diff line number Diff line change
Expand Up @@ -489,15 +489,6 @@ def add_model_parameters(params):
type=int, default=None,
help='Number of heads for Multi-head dot attention. Default: %(default)s.')

model_params.add_argument('--lexical-bias',
default=None,
type=str,
help="Specify probabilistic lexicon (fast_align format) for lexical biasing (Arthur "
"ETAL'16). Set smoothing value epsilon by appending :<eps>")
model_params.add_argument('--learn-lexical-bias',
action='store_true',
help='Adjust lexicon probabilities during training. Default: %(default)s')

model_params.add_argument('--weight-tying',
action='store_true',
help='Turn on weight tying (see arxiv.org/abs/1608.05859). '
Expand Down Expand Up @@ -690,7 +681,8 @@ def add_training_args(params):
default=C.EMBED_INIT_DEFAULT,
choices=C.EMBED_INIT_TYPES,
help='Type of embedding matrix weight initialization. If normal, initializes embedding '
'weights using a normal distribution with std=vocab_size. Default: %(default)s.')
'weights using a normal distribution with std=1/srqt(vocab_size). '
'Default: %(default)s.')
train_params.add_argument('--initial-learning-rate',
type=float,
default=0.0003,
Expand Down Expand Up @@ -750,11 +742,6 @@ def add_training_args(params):
"reduced due to the value of --learning-rate-reduce-num-not-improved. "
"Default: %(default)s.")

train_params.add_argument('--use-fused-rnn',
default=False,
action="store_true",
help='Use FusedRNNCell in encoder (requires GPU device). Speeds up training.')

train_params.add_argument('--rnn-forget-bias',
default=0.0,
type=float,
Expand Down
1 change: 1 addition & 0 deletions sockeye/constants.py
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,7 @@
TRANSFORMER_ENCODER_PREFIX = ENCODER_PREFIX + "transformer_"
CNN_ENCODER_PREFIX = ENCODER_PREFIX + "cnn_"
CHAR_SEQ_ENCODER_PREFIX = ENCODER_PREFIX + "char_"
DEFAULT_OUTPUT_LAYER_PREFIX = "target_output_"

# embedding prefixes
SOURCE_EMBEDDING_PREFIX = "source_embed_"
Expand Down
Loading

0 comments on commit cc24739

Please sign in to comment.