-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Experiment for a flat architecture? #50
Comments
Hi @chokevin8 ! yes we have an internal version with flat (or isotropic) architecture but no plans to release publicly. The performance is comparable (or better) than ViTs with even the most advanced training techniques likes DeiTIII. |
@ahatamiz Thanks for your quick response! Hmm, I see, that's impressive the performance is equal or better than SOTA ViTs. I'm aware that you can't give us a lot of information regarding this but is there any hints you can drop in how to implement this? Meaning would there have to be any changes in the Micro architecture as well? (Obviously macro architecture would have to be changed to make the backbone flat) Lastly, have you ever tried using self-supervised learning with this flat architecture? Thank you! |
Hi @chokevin8 the implementation is quite easy. You can take in the blocks used in stages 3/4 and make a isotropic model out of it without changing the resolution (simply replacing a ViT layout with this setup should work for constructing different model types). But note that you need to maintain our strategy in dividing the number of layers to allocate the first half for MambaMixer blocks and the second half for self-attention blocks. It should work for any types of training including SSL. |
Hi @johnnynunez @ahatamiz , thank you for your brilliant work! I just have a question- have you guys considered using a flat architecture rather than a hierarchical one, and would this implementation be simple enough to just modify the code? Am particularly interested in utilizing this for a self-supervised training application. Any input from you guys would be appreciated, thank you so much!
The text was updated successfully, but these errors were encountered: