Porting Compressive Transformer to Huggingface #15533

vilmarzti · 2022-02-05T16:40:20Z

🌟 New model addition

As part of my Master-Thesis I will implement or adapt different transformer architecture for LM that are specifically designed for long-context situation. As part of this I started with porting the Compressive Transformer to the huggingface interface. And will probably do the same for other architectures in the future. Let me know if you're interested for a pull-request.

Model description

Paper

The Compressive Transformer is an extension of the Transformer-XL architecture with an additional compressed memory. Memories that would get discarded in the Transformer-XL get compressed and added to the compressed memory.
The compression function can take different forms but the best performance on word-level LM is a Conv1d compression.
Training of the C-Transformer happens the same way to how the Transformer-XL architecture is trained. But additional to that there is an "attention-reconstruction-loss" which compares the attention that we get from using the memory with the attention we get from its compressed counter-part. Using the MSE-loss we can perform gradient updates on the compression function

Open source status

the model implementation is available:
https://nn.labml.ai/transformers/compressive/index.html Is an open source implemention under the MIT license
https://github.com/deepmind/pg19 The data-set used in parts of the experiment by the Authors
https://github.com/vilmarzti/long_context_transformers/blob/main/longcontext/transformers/compressive_transformer.py My humble start of porting the architecture to the huggingface-format.
the model weights are available:
None that I could find.
Weights (for wikitext-2 and 103) might become available as my thesis progresses and I start training
who are the authors:
Jack W. Rae (https://github.com/dm-jrae)
Anna Potapenko (https://github.com/AnyaP)
Siddhant M. Jayakumar (github-profile not found)
Chloe Hillier (github profile not found)
Timothy P. Lillicrap (github profile not found)

vilmarzti added the New model label Feb 5, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Porting Compressive Transformer to Huggingface #15533

Porting Compressive Transformer to Huggingface #15533

vilmarzti commented Feb 5, 2022 •

edited

Loading

Porting Compressive Transformer to Huggingface #15533

Porting Compressive Transformer to Huggingface #15533

Comments

vilmarzti commented Feb 5, 2022 • edited Loading

🌟 New model addition

Model description

Open source status

vilmarzti commented Feb 5, 2022 •

edited

Loading