Fix save_state_dict #645

AMHermansen · 2023-12-09T10:26:14Z

Current implementation moves the entire model to cpu, whenever save_state_dict is called. This seems like an undesireable side effect of the method. This PR changes save_state_dict to only save the statedict, but keeps the model on the current device.

RasmusOrsoe · 2023-12-11T14:46:28Z

@AMHermansen thanks for this suggestion. I'm not sure we want this change though - If the state dict is saved to disk from gpu, it will require the model to be on gpu when the state dict is loaded in again, or an error will be thrown. see https://pytorch.org/tutorials/recipes/recipes/save_load_across_devices.html .

Has this been a issue for you?

AMHermansen · 2023-12-11T15:01:43Z

@AMHermansen thanks for this suggestion. I'm not sure we want this change though - If the state dict is saved to disk from gpu, it will require the model to be on gpu when the state dict is loaded in again, or an error will be thrown. see https://pytorch.org/tutorials/recipes/recipes/save_load_across_devices.html .

Has this been a issue for you?

The suggested change in this PR only removes the side effect from the current save_state_dict implementation to not move the model to cpu when this is called. This is done by copying the state_dict to cpu and then saving the copy. The reason for this implementation is to make saving models more streamlined, my current understanding from the example scripts is that save_model_config and save_state_dict is the intended way to save graphnet models. If you however want to save a model like this during training, you will run into problems, since the model will be moved away from the accelerator.

RasmusOrsoe

@AMHermansen sorry, I glanced over this too quickly. I was under the impression that the state dict was saved on whatever device it happened to be on; upon looking at the code again I see that's not the case.

Fix save_state_dict

7601429

Current implementation moves the entire model to cpu, whenever save_state_dict is called. This seems like an undesireable side effect of the method. This PR changes save_state_dict to only save the statedict, but keeps the model on the current device.

AMHermansen requested a review from RasmusOrsoe December 9, 2023 10:26

RasmusOrsoe approved these changes Dec 11, 2023

View reviewed changes

AMHermansen merged commit d06882b into graphnet-team:main Dec 12, 2023
12 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix save_state_dict #645

Fix save_state_dict #645

AMHermansen commented Dec 9, 2023

RasmusOrsoe commented Dec 11, 2023

AMHermansen commented Dec 11, 2023

RasmusOrsoe left a comment

Fix save_state_dict #645

Fix save_state_dict #645

Conversation

AMHermansen commented Dec 9, 2023

RasmusOrsoe commented Dec 11, 2023

AMHermansen commented Dec 11, 2023

RasmusOrsoe left a comment

Choose a reason for hiding this comment