Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vision Titans #7

Open
ShunkaiZhou opened this issue Jan 20, 2025 · 8 comments
Open

Vision Titans #7

ShunkaiZhou opened this issue Jan 20, 2025 · 8 comments

Comments

@ShunkaiZhou
Copy link

ShunkaiZhou commented Jan 20, 2025

Thank you very much for the excellent code!
May I ask if you will build Vision Titans?
Best wishes!!!

@lucidrains
Copy link
Owner

@ShunkaiZhou hey Shunkai, thanks for your interest

could you elaborate a bit more? do you mean in the context of tokenized video or something different?

@ShunkaiZhou
Copy link
Author

@lucidrains Hi, thank you for your quick reply.
For example, online video analytics?
Best wishes.

@kanghuazhao
Copy link

VLM-Titans?

@RefractAI
Copy link

RefractAI commented Jan 29, 2025

@lucidrains

I would like to share that, by replacing the logits with a continuous pixel prediction, this repo trains a class-conditional MNIST digit image generator in under 2 minutes on a 4090. Hyperparameters unchanged from default.

Image

@lucidrains
Copy link
Owner

lucidrains commented Jan 29, 2025

@RefractAI nice! it could just be sliding window attention though

local attention with even the slightest overlap is incredibly strong

@ShunkaiZhou
Copy link
Author

@RefractAI
Hi, your changes are very interesting. Could you share the changed code please? Thanks a lot.
Best wishes!

@lucidrains
Copy link
Owner

@RefractAI did you adapt it from the transfusion repo? i can port one over here if needed

@piwawa
Copy link

piwawa commented Feb 7, 2025

Can you add support for image input? Just like ViT or Swin-Transformer.

@RefractAI did you adapt it from the transfusion repo? i can port one over here if needed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants