This repo implements Fastformer: Additive Attention Can Be All You Need by Wu et al. in Pytorch based on the implementation of Rishit Dagli. Fast Transformer is a Transformer variant based on additive attention that can handle long sequences efficiently with linear complexity. Fastformer is much more efficient than many existing Transformer models and can meanwhile achieve comparable or even better long text modeling performance.
Run the following to install:
pip install fast-transformer-torch
To install fast-transformer-torch
, along with tools you need to develop and test, run the following in your virtualenv:
git clone https://github.com/talipturkmen/Fast-Transformer-Pytorch.git
# or clone your own fork
cd fast-transformer-torch
pip install -e .[dev]
from fast_transformer_torch import FastTransformer
import torch
mask = torch.ones([16, 4096], dtype=torch.bool)
model = FastTransformer(num_tokens = 20000,
dim = 512,
depth = 2,
max_seq_len = 4096,
absolute_pos_emb = True, # Absolute positional embeddings
mask = mask
)
x = torch.randint(0, 20000, (16, 4096))
logits = model(x) # (1, 4096, 20000)