New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

in mlp example: 2 problems #41

Open

yjjinjie opened this issue Mar 16, 2023 · 1 comment

yjjinjie commented Mar 16, 2023

https://github.com/microsoft/mup/blob/main/examples/MLP/main.py#L61
If you don't specify a base shape file, then you are using standard parametrization,in the code,the optimizer will use the MuSGD?https://github.com/microsoft/mup/blob/main/examples/MLP/main.py#L257
why the init func not use the mup.init?
https://github.com/microsoft/mup/blob/main/examples/MLP/main.py#L139

### Replace your custom init, if any
for param in model.parameters():
    ### If initializing manually with fixed std or bounds,
    ### then replace with same function from mup.init
    # torch.nn.init.uniform_(param, -0.1, 0.1)
    mup.init.uniform_(param, -0.1, 0.1)

The text was updated successfully, but these errors were encountered:

Collaborator

edwardjhu commented Mar 22, 2023

Thanks for the questions!

If you don't specify a base shape, it will default to the shape of the target model, which is equivalent to SP even if you are using a MuOptimizer.
We didn't have the mup library when we first wrote the code for the MLP experiment -- you are right that we can use mup.init there. Line 139 to 141 are doing what mup.init does manually.

jhj0411jhj mentioned this issue

Reproducing the training loss vs learning rates curve on MLP #52

Closed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment