Skip to content

Latest commit

 

History

History
574 lines (549 loc) · 14 KB

README.md

File metadata and controls

574 lines (549 loc) · 14 KB

Non-normal Recurrent Neural Network (nnRNN): learning long time dependencies while improving expressivity with transient dynamics

nnRNN - NeurIPS 2019

expRNN code taken from here

EURNN tests based on code taken from here

Summary of Current Results

Copytask

Copytask, T=200, with same number of hidden units

Copytask, T=200, with same number of hidden units

Permuted Sequential MNIST

Permuted Sequential MNIST, with same number of hidden units

Permuted Sequential MNIST, with same number of hidden units

PTB

Changes from paper:

  • Testing of Adam optimizer using betas (0.0, 0.9) on expRNN and nnRNN
  • Added grad clipping
  • Note the large improvements in nnRNN
  • expRNN did not achieve improvements through new optimizer, but was improved by searching higher learning rates

Test Bit per Character (BPC)

Fixed # of params (~1.32 M) Fixed # hidden units (N=1024)
Model TPTB = 150 TPTB = 300 TPTB = 150 TPTB = 300
RNN 2.89 ± 0.002 2.90 ± 0.002 2.89 ± 0.002 2.90 ± 0.002
RNN-orth 1.62 ± 0.004 1.66 ± 0.006 1.62 ± 0.004 1.66 ± 0.006
EURNN 1.61 ± 0.001 1.62 ± 0.001 1.69 ± 0.001 1.68 ± 0.001
expRNN 1.43 ± 0.002 1.44 ± 0.002 1.45 ± 0.002 1.48 ± 0.008
nnRNN 1.40 ± 0.003 1.42 ± 0.003 1.40 ± 0.003 1.42 ± 0.003

Accuracy

Fixed # of params (~1.32 M) Fixed # hidden units (N=1024)
Model TPTB = 150 TPTB = 300 TPTB = 150 TPTB = 300
RNN 40.01 ± 0.026 39.97 ± 0.025 40.01 ± 0.026 39.97 ± 0.025
RNN-orth 66.29 ± 0.07 65.53 ± 0.09 66.29 ± 0.07 65.53 ± 0.09
EURNN 65.68 ± 0.002 65.55 ± 0.002 64.01 ± 0.002 64.20 ± 0.003
expRNN 69.02 ± 0.0005 68.98 ± 0.0003 68.69 ± 0.0004 68.57 ± 0.0004
nnRNN 69.89 ± 0.001 69.54 ± 0.001 69.89 ± 0.001 69.54 ± 0.001

Hyperparameters for reported results

Copytask
Model Hidden Size Optimizer LR Orth. LR δ T decay Recurrent init
RNN 128 RMSprop α=0.9 0.001 Glorot Normal
RNN-orth 128 RMSprop α=0.99 0.0002 Random Orth
EURNN 128 RMSprop α=0.5 0.001
EURNN 256 RMSprop α=0.5 0.001
expRNN 128 RMSprop α=0.99 0.001 0.0001 Henaff
expRNN 176 RMSprop α=0.99 0.001 0.0001 Henaff
nnRNN 128 RMSprop α = 0.99 0.0005 10-6 0.0001 10-6 Cayley
sMNIST
Model Hidden Size Optimizer LR Orth. LR δ T decay Recurrent init
RNN 512 RMSprop α=0.9 0.0001 Glorot Normal
RNN-orth 512 RMSprop α=0.99 5*10-5 Random orth
EURNN 512 RMSprop α=0.9 0.0001
EURNN 1024 RMSprop α=0.9 0.0001
expRNN 512 RMSprop α=0.99 0.0005 5*10-5 Cayley
expRNN 722 RMSprop α=0.99 5*10-5 Cayley
nnRNN 512 RMSprop α=0.99 0.0002 2*10-5 0.1 0.0001 Cayley
LSTM 512 RMSprop α=0.99 0.0005 Glorot Normal
LSTM 257 RMSprop α=0.9 0.0005 Glorot Normal
PTB
Model Hidden Size Optimizer LR Orth. LR δ T decay Recurrent init Grad Clipping Value
Length=150
RNN 1024 RMSprop α=0.9 10-5 Glorot Normal
RNN-orth 1024 RMSprop α=0.9 0.0001 Cayley
EURNN 1024 RMSprop α=0.9 0.001
EURNN 2048 RMSprop α=0.9 0.001
expRNN 1024 RMSprop α=0.9 0.001 Cayley
expRNN 1386 RMSprop α=0.9 0.008 0.0008 Cayley
nnRNN 1024 Adam β = (0.0,0.9) 0.002 0.0002 0.0001 10-5 Cayley 10
Length=300
RNN 1024 RMSprop α=0.9 10-5 Glorot Normal
RNN-orth 1024 RMSprop α=0.9 0.0001 Cayley
EURNN 1024 RMSprop α=0.9 0.001
EURNN 2048 RMSprop α=0.9 0.001
expRNN 1024 RMSprop α=0.9 0.001 Cayley
expRNN 1386 RMSprop α=0.9 0.001 Cayley
nnRNN 1024 Adam β = (0.0, 0.9) 0.002 0.0002 0.0001 10-6 Cayley 5

Usage

Copytask

python copytask.py [args]

Options:

  • net-type : type of RNN to use in test
  • nhid : number if hidden units
  • cuda : use CUDA
  • T : delay between sequence lengths
  • labels : number of labels in output and input, maximum 8
  • c-length : sequence length
  • onehot : onehot labels and inputs
  • vari : variable length
  • random-seed : random seed for experiment
  • batch : batch size
  • lr : learning rate for optimizer
  • lr_orth : learning rate for orthogonal optimizer
  • alpha : alpha value for optimizer (always RMSprop)
  • rinit : recurrent weight matrix initialization options: [xavier, henaff, cayley, random orth.]
  • iinit : input weight matrix initialization, options: [xavier, kaiming]
  • nonlin : non linearity type, options: [None, tanh, relu, modrelu]
  • alam : strength of penalty on (δ in the paper)
  • Tdecay : weight decay on upper triangular matrix values

permuted sequtential MNIST

python sMNIST.py [args]

Options:

  • net-type : type of RNN to use in test
  • nhid : number if hidden units
  • epochs : number of epochs
  • cuda : use CUDA
  • permute : permute the order of the input
  • random-seed : random seed for experiment (excluding permute order which has independent seed)
  • batch : batch size
  • lr : learning rate for optimizer
  • lr_orth : learning rate for orthogonal optimizer
  • alpha : alpha value for optimizer (always RMSprop)
  • rinit : recurrent weight matrix initialization options: [xavier, henaff, cayley, random orth.]
  • iinit : input weight matrix initialization, options: [xavier, kaiming]
  • nonlin : non linearity type, options: [None, tanh, relu, modrelu]
  • alam : strength of penalty on (δ in the paper)
  • Tdecay : weight decay on upper triangular matrix values
  • save_freq : frequency in epochs to save data and network

PTB

Adapted from here

python language_task.py [args]

Options:

  • net-type : type of RNN to use in test
  • emsize : size of word embeddings
  • nhid : number if hidden units
  • epochs : number of epochs
  • bptt : sequence length for back propagation
  • cuda : use CUDA
  • seed : random seed for experiment (excluding permute order which has independent seed)
  • batch : batch size
  • log-interval : reporting interval
  • save : path to save final model and test info
  • lr : learning rate for optimizer
  • lr_orth : learning rate for orthogonal optimizer
  • rinit : recurrent weight matrix initialization options: [xavier, henaff, cayley, random orth.]
  • iinit : input weight matrix initialization, options: [xavier, kaiming]
  • nonlin : non linearity type, options: [None, tanh, relu, modrelu]
  • alam : strength of penalty on (δ in the paper)
  • Tdecay : weight decay on upper triangular matrix values
  • optimizer : choice of optimizer between RMSprop and Adam
  • alpha : alpha value for optimizer (always RMSprop)
  • betas : beta values for adam optimizer