-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Vision Titans #7
Comments
@ShunkaiZhou hey Shunkai, thanks for your interest could you elaborate a bit more? do you mean in the context of tokenized video or something different? |
@lucidrains Hi, thank you for your quick reply. |
VLM-Titans? |
I would like to share that, by replacing the logits with a continuous pixel prediction, this repo trains a class-conditional MNIST digit image generator in under 2 minutes on a 4090. Hyperparameters unchanged from default. |
@RefractAI nice! it could just be sliding window attention though local attention with even the slightest overlap is incredibly strong |
@RefractAI |
@RefractAI did you adapt it from the transfusion repo? i can port one over here if needed |
Can you add support for image input? Just like ViT or Swin-Transformer.
|
Thank you very much for the excellent code!
May I ask if you will build Vision Titans?
Best wishes!!!
The text was updated successfully, but these errors were encountered: