Experiment for a flat architecture? #50

chokevin8 · 2024-12-17T07:01:14Z

Hi @johnnynunez @ahatamiz , thank you for your brilliant work! I just have a question- have you guys considered using a flat architecture rather than a hierarchical one, and would this implementation be simple enough to just modify the code? Am particularly interested in utilizing this for a self-supervised training application. Any input from you guys would be appreciated, thank you so much!

ahatamiz · 2024-12-18T18:00:58Z

Hi @chokevin8 ! yes we have an internal version with flat (or isotropic) architecture but no plans to release publicly.

The performance is comparable (or better) than ViTs with even the most advanced training techniques likes DeiTIII.

chokevin8 · 2024-12-19T00:04:54Z

@ahatamiz Thanks for your quick response! Hmm, I see, that's impressive the performance is equal or better than SOTA ViTs. I'm aware that you can't give us a lot of information regarding this but is there any hints you can drop in how to implement this? Meaning would there have to be any changes in the Micro architecture as well? (Obviously macro architecture would have to be changed to make the backbone flat) Lastly, have you ever tried using self-supervised learning with this flat architecture? Thank you!

ahatamiz · 2024-12-19T18:17:50Z

Hi @chokevin8 the implementation is quite easy. You can take in the blocks used in stages 3/4 and make a isotropic model out of it without changing the resolution (simply replacing a ViT layout with this setup should work for constructing different model types).

But note that you need to maintain our strategy in dividing the number of layers to allocate the first half for MambaMixer blocks and the second half for self-attention blocks.

It should work for any types of training including SSL.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Experiment for a flat architecture? #50

Experiment for a flat architecture? #50

chokevin8 commented Dec 17, 2024

ahatamiz commented Dec 18, 2024

chokevin8 commented Dec 19, 2024 •

edited

Loading

ahatamiz commented Dec 19, 2024

Experiment for a flat architecture? #50

Experiment for a flat architecture? #50

Comments

chokevin8 commented Dec 17, 2024

ahatamiz commented Dec 18, 2024

chokevin8 commented Dec 19, 2024 • edited Loading

ahatamiz commented Dec 19, 2024

chokevin8 commented Dec 19, 2024 •

edited

Loading