You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As part of my Master-Thesis I will implement or adapt different transformer architecture for LM that are specifically designed for long-context situation. As part of this I started with porting the Compressive Transformer to the huggingface interface. And will probably do the same for other architectures in the future. Let me know if you're interested for a pull-request.
The Compressive Transformer is an extension of the Transformer-XL architecture with an additional compressed memory. Memories that would get discarded in the Transformer-XL get compressed and added to the compressed memory.
The compression function can take different forms but the best performance on word-level LM is a Conv1d compression.
Training of the C-Transformer happens the same way to how the Transformer-XL architecture is trained. But additional to that there is an "attention-reconstruction-loss" which compares the attention that we get from using the memory with the attention we get from its compressed counter-part. Using the MSE-loss we can perform gradient updates on the compression function
the model weights are available:
None that I could find.
Weights (for wikitext-2 and 103) might become available as my thesis progresses and I start training
who are the authors:
Jack W. Rae (https://github.com/dm-jrae)
Anna Potapenko (https://github.com/AnyaP)
Siddhant M. Jayakumar (github-profile not found)
Chloe Hillier (github profile not found)
Timothy P. Lillicrap (github profile not found)
The text was updated successfully, but these errors were encountered:
🌟 New model addition
As part of my Master-Thesis I will implement or adapt different transformer architecture for LM that are specifically designed for long-context situation. As part of this I started with porting the Compressive Transformer to the huggingface interface. And will probably do the same for other architectures in the future. Let me know if you're interested for a pull-request.
Model description
Paper
The Compressive Transformer is an extension of the Transformer-XL architecture with an additional compressed memory. Memories that would get discarded in the Transformer-XL get compressed and added to the compressed memory.
The compression function can take different forms but the best performance on word-level LM is a Conv1d compression.
Training of the C-Transformer happens the same way to how the Transformer-XL architecture is trained. But additional to that there is an "attention-reconstruction-loss" which compares the attention that we get from using the memory with the attention we get from its compressed counter-part. Using the MSE-loss we can perform gradient updates on the compression function
Open source status
https://nn.labml.ai/transformers/compressive/index.html Is an open source implemention under the MIT license
https://github.com/deepmind/pg19 The data-set used in parts of the experiment by the Authors
https://github.com/vilmarzti/long_context_transformers/blob/main/longcontext/transformers/compressive_transformer.py My humble start of porting the architecture to the huggingface-format.
None that I could find.
Weights (for wikitext-2 and 103) might become available as my thesis progresses and I start training
Jack W. Rae (https://github.com/dm-jrae)
Anna Potapenko (https://github.com/AnyaP)
Siddhant M. Jayakumar (github-profile not found)
Chloe Hillier (github profile not found)
Timothy P. Lillicrap (github profile not found)
The text was updated successfully, but these errors were encountered: