Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Model parameters being marked as orphan when using KFAC optimizer #270

Open
Uernd opened this issue Sep 22, 2024 · 2 comments
Open

Model parameters being marked as orphan when using KFAC optimizer #270

Uernd opened this issue Sep 22, 2024 · 2 comments

Comments

@Uernd
Copy link

Uernd commented Sep 22, 2024

I was trying to use KFAC optimizer to train the FermiNet, a well-designed PINN for solving atoms and molecules. But when I was trying to add the parity constrain to it, I noticed that some of the parameters was labeled as 'orphan', though the calculation was proceeding normally.

I wonder what 'orphan' means, and is this a normal circumstance? Could these parameters labelled as 'orphan' being updated while trainning normally? I mean even though it finally yielded a correct results, there still was a lot of normally-labelled parameters which ensured the model is trainable.

The following picture contains all my changes to FermiNet. I've only modified the networks.py file.
all my changes to ferminet

Thank you for your assistance !

@james-martens
Copy link
Collaborator

I haven't seen the optimizer generate logs with "orphans" in a long time. AFAIK the automatic scanner will register any unrecognized parameter as "generic".

With that out of the way, these situations come up when a parameter is used in the graph in a way that doesn't conform to one of several recognized patterns. In that case, the parameter uses a curvature approximation that is usually very crude: either "naive diagonal" or "naive full". kfac_jax currently doesn't support non-generic registrations inside of vmaps, so that probably explains what you are seeing.

@botev
Copy link
Contributor

botev commented Nov 8, 2024

Most likely what is happening is that your changes (which I do not understand since I haven't worked on the FermiNet for a very long while) are breaking one of the repeated dense patterns that are used for recognizing and assigning a curvature approximation to those parameters.

Could you post an actual diff with the FermiNet code to be a bit more clear what are you changing?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants