Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Porting Compressive Transformer to Huggingface #15533

Open
3 tasks done
vilmarzti opened this issue Feb 5, 2022 · 0 comments
Open
3 tasks done

Porting Compressive Transformer to Huggingface #15533

vilmarzti opened this issue Feb 5, 2022 · 0 comments

Comments

@vilmarzti
Copy link

vilmarzti commented Feb 5, 2022

🌟 New model addition

As part of my Master-Thesis I will implement or adapt different transformer architecture for LM that are specifically designed for long-context situation. As part of this I started with porting the Compressive Transformer to the huggingface interface. And will probably do the same for other architectures in the future. Let me know if you're interested for a pull-request.

Model description

Paper

The Compressive Transformer is an extension of the Transformer-XL architecture with an additional compressed memory. Memories that would get discarded in the Transformer-XL get compressed and added to the compressed memory.
The compression function can take different forms but the best performance on word-level LM is a Conv1d compression.
Training of the C-Transformer happens the same way to how the Transformer-XL architecture is trained. But additional to that there is an "attention-reconstruction-loss" which compares the attention that we get from using the memory with the attention we get from its compressed counter-part. Using the MSE-loss we can perform gradient updates on the compression function

Open source status

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant