New activation checkpointing #343

dirkgr · 2023-10-26T20:36:13Z

The checkpoint wrapper stuff is undocumented in torch. It breaks the way we construct parameter groups. This new way is documented, and, I think, easier to understand.

epwalsh · 2023-10-26T21:49:33Z

olmo/config.py

@@ -853,6 +854,11 @@ class TrainConfig(BaseConfig):
    Stop at a specific step.
    """

+    activation_checkpointing: bool = False


We already have this field

epwalsh · 2023-10-26T21:52:22Z

olmo/config.py

@@ -404,6 +404,12 @@ class ModelConfig(BaseConfig):
    See :data:`TrainConfig.precision` instead.
    """

+    activation_checkpointing: bool = False


When this gets set to true during training this could cause issues later when loading the model for inference since. Instead maybe we have a method on the model like Olmo.enable_activation_checkpointing()? The trainer calls that when TrainConfig.activation_checkpointing is true, so we don't need to add a configuration option to ModelConfig`.

I don't see why it would cause issues for inference, as the resulting checkpoint files should be 100% identical.

But I like this other design anyways.

What I mean is that it would enable activation checkpointing when the model is loaded for inference

# Conflicts: # olmo/model.py

This is untested.

epwalsh

LGTM!

dirkgr added 2 commits October 26, 2023 13:34

Do activation checkpointing in a different way

b67e92c

Forgot the X

c8c6c68

dirkgr requested a review from epwalsh October 26, 2023 20:36

dirkgr added 2 commits October 26, 2023 14:07

Python imports are weird

1e2c7e0

Productivity through formatting

c527ff4

epwalsh requested changes Oct 26, 2023

View reviewed changes

dirkgr added 3 commits October 26, 2023 17:43

Make activation checkpointing enablable on the fly

3326f93

Merge remote-tracking branch 'origin/main' into ActivationCheckpointing

09f0e44

# Conflicts: # olmo/model.py

Makes checkpointing work with block groups

5dde774

This is untested.

epwalsh approved these changes Oct 27, 2023

View reviewed changes

I'd rather be sailing.

c366e42

dirkgr merged commit 5c64338 into main Oct 27, 2023
10 checks passed

dirkgr deleted the ActivationCheckpointing branch October 27, 2023 17:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New activation checkpointing #343

New activation checkpointing #343

dirkgr commented Oct 26, 2023

epwalsh Oct 26, 2023

epwalsh Oct 26, 2023

dirkgr Oct 26, 2023

dirkgr Oct 27, 2023

epwalsh Oct 27, 2023

epwalsh left a comment

New activation checkpointing #343

New activation checkpointing #343

Conversation

dirkgr commented Oct 26, 2023

epwalsh Oct 26, 2023

Choose a reason for hiding this comment

epwalsh Oct 26, 2023

Choose a reason for hiding this comment

dirkgr Oct 26, 2023

Choose a reason for hiding this comment

dirkgr Oct 27, 2023

Choose a reason for hiding this comment

epwalsh Oct 27, 2023

Choose a reason for hiding this comment

epwalsh left a comment

Choose a reason for hiding this comment