You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Results from sweep on small model definitely show some affects of radius, though it is indeed around the sqrt embedding dim:
May need to test on a larger network in order to see more definitive differences from slight effects like removing the gain, however it seems for this network hyperspherenorm is really just as good as rmsnorm for conventional radius.
Test out norm variation that simply maps to the surface of a hypersphere.
The text was updated successfully, but these errors were encountered: