The compressed long short-term memory (CLSTM) Keras layer presented in the repository is based on LSTMCell and LSTM layer from Keras version 2.2.4. [1] which are compressed using the compression method presented by Prabhavalkar et al. in [2]. The CLSTMs can be used at inference or for curriculum training with compressed weights of a previously trained network with LSTMs instead of CLSTMs. The file src/compress_weights.py contains all required functions to compress the weights of trained LSTM layers, which can then be passed to the CLSTM layers.
In our master thesis [3] we explain the CLSTM in detail. Furthermore, we study the effect of replacing the LSTMs with CLSTMs on the model size, run time, and speech separation quality of the Online Deep Attractor Network [4] for monaural speech separation. Our experiments show that the proposed compression method for LSTMs is superior to hyper-parameter tuning in the task of reducing the run time by shrinking a neural network while trying to maintaining the speech separation quality.
pip install numpy Keras==2.2.4 tensorflow==1.13.1
The folder src contains three source files:
- clstm.py contains the CLSTM layer class which uses the CLSTMCell class.
- compress_weights.py contains functions to compress the weights of LSTM layers, which then can be passed to the CLSTM layers. The
compress_weights(weights_to_be_compressed, compression_threshold)
function expects theweights_to_be_compressed
as list of weights by LSTM layer providing a list containing[kernel, recurrent_kernel, bias]
. The last element of the list should be the weights of the layer that follows the last LSTM layer starting with the kernel weights. Furthermore, thecompression_threshold
(equivalent to the threshold tau in [2]) has to be given to thecompress_weights()
function. - example.py provides an example model with two LSTM layers followed by one dense layer. The weights of this model are saved, compressed, and loaded into the compressed version of the example model in which the LSTM layers are replaced by CLSTM layers.
The documentation provides some theoretical background on the compressed long short-term memory. For more details see [3] and [2].
[1] https://github.com/keras-team/keras/blob/2.2.4/keras/layers/recurrent.py
[2] R. Prabhavalkar, O. Alsharif, A. Bruguier, and L. McGraw, “On the compression of recurrent neural networks with an application to LVCSR acoustic modeling for embedded speech recognition,” in 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2016, pp. 5970–5974. https://ieeexplore.ieee.org/abstract/document/7472823/
[3] M. Siemering, “Real-time speech separation with deep attractor networks on an embedded system,” Nov. 2020.
[4] C. Han, Y. Luo, and N. Mesgarani, “Online Deep Attractor Network for Real-time Single-channel Speech Separation,” in ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), May 2019, pp. 361–365, doi: 10.1109/ICASSP.2019.8682884. https://ieeexplore.ieee.org/abstract/document/8682884