-
-
Notifications
You must be signed in to change notification settings - Fork 424
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature request: support return_mems in ContinuousTransformerWrapper #166
Comments
@pfeatherstone oh sure! threw it in there quickly before starting my main work how are you using it? 🧐 |
it is actually interesting how many people have told me they are using the continuous wrapper, although there's so little research on that. it works well? |
We use a continuous transformer in our new paper: https://arxiv.org/pdf/2307.04686.pdf for music generation and find that it works well! we use the continuous representation of the VQ-VAE latents as the continuous embeddings used as input for the transformer. Awesome work w/ this repo btw @lucidrains! |
@hugofloresgarcia congrats on the paper! |
Oh it's just my inputs are in normalized floating point format already, not tokenized. I think Wav2Vec2 is basically like that no? |
@lucidrains How do you train a non-autoregressive continuous transformer with |
@pfeatherstone ohh, this repository is not well suited for custom recurrent are you trying to do something like RMT, but non-autoregressive? does your idea resemble memformer? |
Yes it's similar. To be honest I thought this repo would have done the job. Maybe I need to read up on this more to properly determine which architecture suits me best. To me the mechanism provided in this repo ( Basically I want to output mem outputs from running segment (t) and feed them along with segment (t+1) to the next iteration, exactly like how |
Basically i want a kind of stream-aware transformer with causal attention, non-autoregressive, trained with CTC loss, with an effective response that is infinite, a bit like how infinite impulse response (IIR) filters work. In transformer world, if you constantly feed |
@pfeatherstone yea, i'm a big fan of the RMT architecture too |
let me think, yea, i think x-transformers is close, since it has the ability to prepend embeddings (like PaLI) i can take a look at this later this week |
So I can see three candidate:
I'm new to recurrent transformers. I'm pretty familiar with "normal" transformers (GPT like for example), where you basically feed your entire input (text, image, or whatever). But though recurrence seems easy to design in the forward pass, I can't quite see how you train effectively (backward pass). do you need to ramdonly partition your input into segments during training and pretend you are "streaming" or is there a more elegant, less faffy, way of training |
@pfeatherstone i think your best bet is to modify the RMT architecture i also included the memory-replay-backprop technique from memformer, so the network can learn to formulate its memories better with little cost to hardware memory |
@lucidrains There is also this paper https://arxiv.org/pdf/2109.00301.pdf, which proposes infinity-former |
We just need IIR filters in neural networks... |
@lucidrains have you looked at RWKV architecture? Looks like it's solving something similar. Surely all these RNN+Transformer architectures are going to converge. |
It would be great if
ContinuousTransformerWrapper
supportedreturn_mems
in the forward pass.Thank you for the awesome repo!
Remarkably, it all works with
torch.onnx.export()
!The text was updated successfully, but these errors were encountered: