25 normalize data on gpu #39

sadamov · 2024-05-25T17:06:48Z

Summary

This PR introduces on_after_batch_transfer logic to normalize data on GPU instead of on CPU before transfer.

Rationale

Normalization is faster on GPU than CPU. In the current code the data was normalized in the pytorch dataset class in the get_item method potentially slowing down training; especially on systems with fewer CPU cores.

Changes

Introduces on_after_batch_transfer in the ar_model.py script
Remove everything related to "standardization" from the remaining scripts
Adapted the create_parameter_weights.py script to work with the new changes (not reloading standardized dataset)

Testing

Both training and evaluation was successful. The training loss of 3.230 on the meps_example is identical to before the changes. The create_parameter_weights script was executed to successfully generate the stats.

Not-In-Scope

The normalization stats and other static features will all become zarr archives in the future. Their path defined in the data_config.yaml file.

leifdenby · 2024-05-28T10:18:50Z

This is looking good @sadamov! Can I suggest we merge #38 first, you merge that into your branch and then I do a review? That way we can ensure everything keeps working :)

joeloskarsson

This is a lot cleaner and also good to move computations from CPU to GPU. There is a question mark about forcing standardization that should be double-checked. In the future we might want to handle standardization of forcing the same for all forcing variables, or have a more robust way to control this (i.e. which forcing should be standardized or not), but let's leave that to then.

joeloskarsson · 2024-05-30T09:03:27Z

neural_lam/models/ar_model.py

+        init_states, target_states, forcing_features = batch
+        init_states = (init_states - self.data_mean) / self.data_std
+        target_states = (target_states - self.data_mean) / self.data_std
+        forcing_features = (forcing_features - self.flux_mean) / self.flux_std


Now all forcing seem to be normalized with the flux statistics, but there is more forcing than flux. Note how this was only applied to the flux in WeatherDataset before.
Have you tested that this gives exactly the same tensors as before? (e.g. save the first batch to disk on main, check this out, save first batch and compare).

True, the forcings are now handled differently. I suggest to implement a new logic to handle forcings in #54. The user can define combined_vars that share statistics and also define vars that should not be normalized.

neural_lam/models/ar_model.py

sadamov · 2024-06-09T09:49:41Z

I merged with the latest updates from main @leifdenby and commented on your suggestions @joeloskarsson. In the following I want to show that the output tensor was not affected by this change. As you suggested I stored the last batch for both gpu/cpu normalization with deterministic=True and a set seed:

Then I compared the tensors like this:

So except for the forcings tensor which are handled differently the two approaches create identical output.

joeloskarsson · 2024-06-12T08:29:56Z

How do you think we should progress with this @sadamov ? If the forcing is handled differently in #54, would it make more sense to try to merge this after that? (I guess baking this change into #54 would just make it even bigger). Or should we merge this first so that #54 can build on it and be adapted to use it?

I would not be happy to merge this without fixing so also the forcing tensors match the previous implementation. But we could just do a quick fix for now so the standardization is only applied to the flux dimensions of the forcing tensor?

sadamov · 2024-06-12T09:07:55Z

I propose to merge #54 first and leave this open until then. We now know that on_after_batch_transfer works as expected and can shortly after implement this PR.

joeloskarsson · 2024-08-25T19:03:32Z

As we look at this after #66 we should think about more options for rescaling of the different variables. In #66 all variables (including state/forcing/static) are standardized. There are benefits to allowing for all three of:

Standardize to $\mu=0, \sigma=0$
Normalize to $[0,1]$
Perform no rescaling

for different variables. We do however need a way to specify what should be used for each variable, as well as computing the needed data statistics. Some of this might have to be done in mllam_data_prep, but at least the final computation should be in scope for this PR, as it is what the code has to do on the GPU.

sadamov · 2025-02-13T20:18:55Z

Please delete the PR-branch after merging, it's the last branch on mllam/neural_lam instead of fork.

sadamov added 2 commits May 25, 2024 18:57

Implement on_batch_transfer logic to normalize data

c014222

bugfix in main

a7deff9

sadamov added the enhancement New feature or request label May 25, 2024

sadamov requested review from leifdenby and joeloskarsson May 25, 2024 17:06

sadamov self-assigned this May 25, 2024

sadamov linked an issue May 25, 2024 that may be closed by this pull request

Normalize Data on GPU #25

Open

sadamov mentioned this pull request May 25, 2024

Normalize Data on GPU #25

Open

joeloskarsson requested changes May 30, 2024

View reviewed changes

sadamov added 4 commits June 7, 2024 20:23

Merge remote-tracking branch 'origin/main' into 25-normalize-data-on-gpu

2d3b509

fixed some issues after merge with main

7802553

improve docstring

ef3acc2

linter

86e6cba

joeloskarsson added this to the v0.5.0 milestone Nov 20, 2024

joeloskarsson mentioned this pull request Dec 9, 2024

Implement standardization of static features #96

Merged

20 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

25 normalize data on gpu #39

25 normalize data on gpu #39

sadamov commented May 25, 2024 •

edited

Loading

leifdenby commented May 28, 2024

joeloskarsson left a comment

joeloskarsson May 30, 2024

sadamov Jun 9, 2024 •

edited

Loading

sadamov commented Jun 9, 2024 •

edited

Loading

joeloskarsson commented Jun 12, 2024

sadamov commented Jun 12, 2024

joeloskarsson commented Aug 25, 2024

sadamov commented Feb 13, 2025

25 normalize data on gpu #39

Are you sure you want to change the base?

25 normalize data on gpu #39

Conversation

sadamov commented May 25, 2024 • edited Loading

Summary

Rationale

Changes

Testing

Not-In-Scope

leifdenby commented May 28, 2024

joeloskarsson left a comment

Choose a reason for hiding this comment

joeloskarsson May 30, 2024

Choose a reason for hiding this comment

sadamov Jun 9, 2024 • edited Loading

Choose a reason for hiding this comment

sadamov commented Jun 9, 2024 • edited Loading

joeloskarsson commented Jun 12, 2024

sadamov commented Jun 12, 2024

joeloskarsson commented Aug 25, 2024

sadamov commented Feb 13, 2025

sadamov commented May 25, 2024 •

edited

Loading

sadamov Jun 9, 2024 •

edited

Loading

sadamov commented Jun 9, 2024 •

edited

Loading