Add `jac_chunk_size` keyword argument to `ObjectiveFunction` to reduce memory usage of forward mode Jacobian calculation #1052

dpanici · 2024-06-13T19:54:55Z

changes most jnp.vectorize calls to instead use batched_vectorize which performs the function vectorization in smaller chunks, which reduces the memory cost of the calculation, at the expense of taking longer the smaller the chunk size is.
Add jac_chunk_size to ObjectiveFunction and _Objective to control the above chunk size for the fwd mode Jacobian calculation
- if None, the chunk size is equal to dim_x, so no chunking is done
- if an int, this is the chunk size to be used.
- if "auto" for the ObjectiveFunction, will use a heuristic for the maximum jac_chunk_size needed to fit the jacobian calculation on the available device memory, according to the formula: max_jac_chunk_size = (desc_config.get("avail_mem") / estimated_memory_usage - 0.22) / 0.85 * self.dim_x
the ObjectiveFunction jac_chunk_size is used if deriv_mode="batched", and the _Objective jac_chunk_size will be used if deriv_mode="blocked"

This works well, this is LMN18 equilibrium solve with 1.5 oversampled grid and maxiter=10 memory trace vs time on GPU, where we get 4x memory decrease with negligible runtime increase:

Also, I can do up to an LMN=20 eq ForceBalance objective with the default double grid oversampling, and with the "auto" chunk sizing, the jacobian compiles and computes without going OOM on an 80gb GPU (on master this would go OOM).

TODO

re-implement without relying on netket
change chunk_size to a better default value (something like 100 would be fine, maybe can dynamically choose based off of size of dim_x)
Add chunk_size argument to every Objective class
- I am choosing right now to not to add it as an arg to the LinearObjective classes, though technically you could
Add "chunked" as a deriv_mode to Derivative (or, just as an argument to Derivative to be used when "batched" is used) - > I don't remember what this was exactly, I think we can keep just for Objectives
change chunk_size to jacobian_chunk_size for Objective kwarg
use in constraint wrappers

TODO Later

add to singular integral calculation as well

Resolves #826

…ve jvp_scaled to use custom vectorize

codecov · 2024-06-13T20:39:45Z

Codecov Report

Attention: Patch coverage is 89.44099% with 17 lines in your changes missing coverage. Please review.

Project coverage is 92.19%. Comparing base (15d95f1) to head (3e99510).
Report is 5 commits behind head on master.

Files with missing lines	Patch %	Lines
desc/batching.py	85.83%	17 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #1052      +/-   ##
==========================================
- Coverage   95.37%   92.19%   -3.19%     
==========================================
  Files          96       96              
  Lines       23560    23560              
==========================================
- Hits        22471    21721     -750     
- Misses       1089     1839     +750

Files with missing lines	Coverage Δ
desc/continuation.py	`93.26% <100.00%> (ø)`
desc/derivatives.py	`92.85% <100.00%> (-0.43%)`	⬇️
desc/objectives/_bootstrap.py	`97.14% <ø> (ø)`
desc/objectives/_coils.py	`99.17% <ø> (ø)`
desc/objectives/_equilibrium.py	`94.53% <ø> (-0.43%)`	⬇️
desc/objectives/_free_boundary.py	`82.83% <ø> (-14.20%)`	⬇️
desc/objectives/_generic.py	`67.76% <ø> (-29.76%)`	⬇️
desc/objectives/_geometry.py	`96.93% <ø> (ø)`
desc/objectives/_omnigenity.py	`96.30% <ø> (ø)`
desc/objectives/_power_balance.py	`87.50% <ø> (-2.09%)`	⬇️
... and 7 more

... and 22 files with indirect coverage changes

github-actions · 2024-06-13T21:14:18Z

|             benchmark_name             |         dt(%)          |         dt(s)          |        t_new(s)        |        t_old(s)        | 
| -------------------------------------- | ---------------------- | ---------------------- | ---------------------- | ---------------------- |
 test_build_transform_fft_lowres         |     +3.16 +/- 3.96     | +1.69e-02 +/- 2.12e-02 |  5.53e-01 +/- 2.0e-02  |  5.36e-01 +/- 6.8e-03  |
 test_equilibrium_init_medres            |     -0.44 +/- 5.40     | -1.94e-02 +/- 2.36e-01 |  4.34e+00 +/- 1.0e-01  |  4.36e+00 +/- 2.1e-01  |
 test_equilibrium_init_highres           |     -0.74 +/- 2.41     | -4.25e-02 +/- 1.39e-01 |  5.73e+00 +/- 1.2e-01  |  5.77e+00 +/- 6.4e-02  |
 test_objective_compile_dshape_current   |     -1.42 +/- 1.53     | -5.72e-02 +/- 6.15e-02 |  3.97e+00 +/- 5.0e-02  |  4.03e+00 +/- 3.6e-02  |
 test_objective_compute_dshape_current   |     -1.97 +/- 3.73     | -7.30e-05 +/- 1.39e-04 |  3.64e-03 +/- 4.4e-05  |  3.71e-03 +/- 1.3e-04  |
 test_objective_jac_dshape_current       |     -0.67 +/- 4.78     | -2.76e-04 +/- 1.96e-03 |  4.08e-02 +/- 1.4e-03  |  4.11e-02 +/- 1.3e-03  |
 test_perturb_2                          |     +0.42 +/- 3.47     | +7.51e-02 +/- 6.14e-01 |  1.78e+01 +/- 5.1e-01  |  1.77e+01 +/- 3.5e-01  |
 test_proximal_freeb_jac                 |     -0.28 +/- 1.56     | -2.12e-02 +/- 1.17e-01 |  7.51e+00 +/- 7.8e-02  |  7.53e+00 +/- 8.8e-02  |
 test_solve_fixed_iter                   |     +0.40 +/- 57.53    | +2.00e-02 +/- 2.88e+00 |  5.03e+00 +/- 2.0e+00  |  5.01e+00 +/- 2.1e+00  |
 test_build_transform_fft_midres         |     -0.76 +/- 5.52     | -4.77e-03 +/- 3.46e-02 |  6.23e-01 +/- 1.1e-02  |  6.28e-01 +/- 3.3e-02  |
 test_build_transform_fft_highres        |     -0.34 +/- 3.28     | -3.46e-03 +/- 3.36e-02 |  1.02e+00 +/- 9.3e-03  |  1.02e+00 +/- 3.2e-02  |
 test_equilibrium_init_lowres            |     +1.50 +/- 3.83     | +5.83e-02 +/- 1.49e-01 |  3.95e+00 +/- 1.5e-01  |  3.89e+00 +/- 3.4e-02  |
 test_objective_compile_atf              |     -0.08 +/- 4.11     | -6.25e-03 +/- 3.25e-01 |  7.90e+00 +/- 2.4e-01  |  7.91e+00 +/- 2.2e-01  |
 test_objective_compute_atf              |     +2.00 +/- 2.81     | +2.10e-04 +/- 2.97e-04 |  1.07e-02 +/- 2.5e-04  |  1.05e-02 +/- 1.5e-04  |
 test_objective_jac_atf                  |     +1.18 +/- 2.10     | +2.33e-02 +/- 4.16e-02 |  2.00e+00 +/- 3.0e-02  |  1.98e+00 +/- 2.8e-02  |
 test_perturb_1                          |     +7.72 +/- 3.83     | +9.70e-01 +/- 4.81e-01 |  1.35e+01 +/- 4.2e-01  |  1.26e+01 +/- 2.3e-01  |
 test_proximal_jac_atf                   |     +1.08 +/- 0.76     | +8.87e-02 +/- 6.27e-02 |  8.29e+00 +/- 4.7e-02  |  8.20e+00 +/- 4.1e-02  |
 test_proximal_freeb_compute             |     +2.84 +/- 1.08     | +5.27e-03 +/- 2.00e-03 |  1.91e-01 +/- 1.8e-03  |  1.86e-01 +/- 9.6e-04  |

unalmis · 2024-06-14T00:30:00Z

You might already be aware but fyi: jax-ml/jax#19614

PhilipVinc · 2024-06-17T09:37:22Z

If you don't care about jax's native multi-GPU sharding support it should be easy to just vendor our implementation.
In that case, you can just vendor our netket/jax/_chunk_utils.py, netket/jax/_scanmap.py and netket/jax/_vmap_chunked.py .

The former 2 files are on purpose standalone. Only _vmap_chunked depends on other things but only if you are using sharding.

Remove all branches hitting of axis_0_is_sharded == True and config.netket_experimental_sharding, which will allow you to remove sharding_decorator, which is a mess only needed to support efficient sharding of jax arrays.

Also replace HashablePartial with functools.partial

dpanici · 2024-07-10T20:27:46Z

@dpanici jax batched vmap has been merged to master

…ESC into dp/jacobian-batched-vmap

dpanici · 2024-07-23T18:19:29Z

@dpanici make separate branch with the implementation using JAX's version, and in this PR implement the one based off of netket

dpanici · 2024-08-07T19:43:20Z

@kianorr @YigitElma

…ESC into dp/jacobian-batched-vmap

…unked implementation

…ESC into dp/jacobian-batched-vmap

CHANGELOG.md

YigitElma · 2024-09-26T02:21:42Z

desc/batching.py

+
+    Parameters
+    ----------
+    f: a function that takes elements of the leading dimension of x


This is not consistent with our docstring format but no problem. Just pointing out.

YigitElma · 2024-09-26T02:27:04Z

desc/batching.py

+    of functions that act on JAX arrays.
+
+    Parameters
+    ----------


The above docstrings are not too important but this one might be checked more, so maybe make it consistent with out doc format? Again, not too important, we can change it in a later PR.

YigitElma · 2024-09-26T03:07:27Z

desc/objectives/objective_funs.py

@@ -474,8 +553,6 @@ def jac_scaled_error(self, x, constants=None):

        if self._deriv_mode == "batched":
            J = Derivative(self.compute_scaled_error, mode="fwd")(x, constants)
-        if self._deriv_mode == "looped":


This is a completely unrelated comment but I think we should explain this jacobian, jvp, etc better in some dev guide. For example, an individual objectives jac_scaled_error doesn't use batched_vmap it is usually jax.jacfwd. On the other hand, when we wrap the objective and constraints, the jacobian is calculated by corresponding jvp_ method, not jac_ method. Even now it confuses me. What is the case we use jac_ method instead of jvp_? Good thing to clarify in long-waiting dev-guide

unalmis · 2024-09-26T20:16:57Z

desc/batching.py

+        # into vmap, we can make use of more efficient batching rules for
+        # primitives where only some arguments are batched (e.g., for
+        # lax_linalg.triangular_solve), and avoid instantiating large broadcasted
+        # arrays.


commenting for reference

Most of the documentation of our objectives has the same parameters that are inherited from the main `_Objective` class. This PR removes the repeated docstring from each objective and updates the docstring by inheriting which reduces the lines of code and also facilitates the maintenance. For example, when we add `jac_chunk_size` in #1052, we have to copy-paste the docs to every single objective which is tedious. Introduces `collect_docs` function that creates docstring for common parameters and with option to overwrite user can give a custom definition for a parameter without changing the order of the docs Resolves #879

dpanici and others added 4 commits June 12, 2024 14:26

add netket to use vmap_chunked, change the vectorize calls in objecti…

e1ffec8

…ve jvp_scaled to use custom vectorize

add chunk_size arg

5aef59c

add chunk_size to objectivefunction as well

67844a7

Merge branch 'master' into dp/jacobian-batched-vmap

60e142c

Merge branch 'master' into dp/jacobian-batched-vmap

c27bf9b

dpanici and others added 3 commits June 14, 2024 12:54

Merge branch 'master' into dp/jacobian-batched-vmap

f3b6683

set default chunk size to None to disable chunking by default:

517a519

fix default chunk size in backend

6b786d0

dpanici and others added 4 commits June 24, 2024 16:31

add a comment

135adce

add chunk_size arg to solve continuation

0366043

fix error

131ae78

Merge branch 'master' into dp/jacobian-batched-vmap

8cfcf8d

Merge branch 'dp/jacobian-batched-vmap' of github.com:PlasmaControl/D…

8f0515c

…ESC into dp/jacobian-batched-vmap

Merge branch 'master' into dp/jacobian-batched-vmap

c3ba7ac

dpanici and others added 6 commits August 16, 2024 11:13

Merge branch 'dp/jacobian-batched-vmap' of github.com:PlasmaControl/D…

ce5742f

…ESC into dp/jacobian-batched-vmap

Merge branch 'dp/jacobian-batched-vmap' of github.com:PlasmaControl/D…

1f32705

…ESC into dp/jacobian-batched-vmap

Merge branch 'master' into dp/jacobian-batched-vmap

02ba37e

remove netket dependence, vendor only needed parts from their vmap_ch…

feb006e

…unked implementation

add chunk_size arg to every Obective

eda29ee

Merge branch 'master' into dp/jacobian-batched-vmap

95e25cc

dpanici marked this pull request as ready for review August 22, 2024 20:19

dpanici added 2 commits August 22, 2024 16:22

put chunk size in a few tests

7aca7ad

Merge branch 'dp/jacobian-batched-vmap' of github.com:PlasmaControl/D…

fcdf492

…ESC into dp/jacobian-batched-vmap

dpanici mentioned this pull request Sep 23, 2024

Make a utils directory #1251

Open

dpanici and others added 7 commits September 23, 2024 21:11

Merge branch 'master' into dp/jacobian-batched-vmap

1c1fab4

remove auto option for _Objective chunk size

3e8e364

rename utils file

b0b78fe

resolve comments

be57bbc

resolve further comment

cd865d0

change warnings to errors

e0e513f

add disclaimer to ObjectiveFunction for auto chunk size on HPC CPU

a6e29d6

dpanici requested review from f0uriest and a team September 24, 2024 16:17

dpanici and others added 2 commits September 24, 2024 13:39

Merge branch 'master' into dp/jacobian-batched-vmap

6988660

Merge branch 'master' into dp/jacobian-batched-vmap

f4fbbcf

f0uriest previously approved these changes Sep 25, 2024

View reviewed changes

update changelog

9100e7c

dpanici dismissed f0uriest’s stale review via 9100e7c September 25, 2024 16:03

f0uriest previously approved these changes Sep 25, 2024

View reviewed changes

YigitElma mentioned this pull request Sep 25, 2024

Add docstring inheriter #1258

Merged

YigitElma previously approved these changes Sep 26, 2024

View reviewed changes

correct changelog

15d95f1

dpanici dismissed stale reviews from YigitElma and f0uriest via 15d95f1 September 26, 2024 12:42

dpanici requested review from YigitElma and f0uriest September 26, 2024 12:42

YigitElma approved these changes Sep 26, 2024

View reviewed changes

Merge branch 'master' into dp/jacobian-batched-vmap

3e99510

dpanici merged commit 17c2b15 into master Sep 26, 2024
22 of 24 checks passed

dpanici deleted the dp/jacobian-batched-vmap branch September 26, 2024 18:56

unalmis reviewed Sep 26, 2024

View reviewed changes

YigitElma mentioned this pull request Sep 26, 2024

Objective deriv_mode using thing.dim_x results in different Jacobian size #1276

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `jac_chunk_size` keyword argument to `ObjectiveFunction` to reduce memory usage of forward mode Jacobian calculation #1052

Add `jac_chunk_size` keyword argument to `ObjectiveFunction` to reduce memory usage of forward mode Jacobian calculation #1052

dpanici commented Jun 13, 2024 •

edited

Loading

codecov bot commented Jun 13, 2024 •

edited

Loading

github-actions bot commented Jun 13, 2024 •

edited

Loading

unalmis commented Jun 14, 2024

PhilipVinc commented Jun 17, 2024

dpanici commented Jul 10, 2024

dpanici commented Jul 23, 2024

dpanici commented Aug 7, 2024

YigitElma Sep 26, 2024

YigitElma Sep 26, 2024

YigitElma Sep 26, 2024

unalmis Sep 26, 2024

Add jac_chunk_size keyword argument to ObjectiveFunction to reduce memory usage of forward mode Jacobian calculation #1052

Add jac_chunk_size keyword argument to ObjectiveFunction to reduce memory usage of forward mode Jacobian calculation #1052

Conversation

dpanici commented Jun 13, 2024 • edited Loading

codecov bot commented Jun 13, 2024 • edited Loading

Codecov Report

github-actions bot commented Jun 13, 2024 • edited Loading

unalmis commented Jun 14, 2024

PhilipVinc commented Jun 17, 2024

dpanici commented Jul 10, 2024

dpanici commented Jul 23, 2024

dpanici commented Aug 7, 2024

YigitElma Sep 26, 2024

Choose a reason for hiding this comment

YigitElma Sep 26, 2024

Choose a reason for hiding this comment

YigitElma Sep 26, 2024

Choose a reason for hiding this comment

unalmis Sep 26, 2024

Choose a reason for hiding this comment

Add `jac_chunk_size` keyword argument to `ObjectiveFunction` to reduce memory usage of forward mode Jacobian calculation #1052

Add `jac_chunk_size` keyword argument to `ObjectiveFunction` to reduce memory usage of forward mode Jacobian calculation #1052

dpanici commented Jun 13, 2024 •

edited

Loading

codecov bot commented Jun 13, 2024 •

edited

Loading

github-actions bot commented Jun 13, 2024 •

edited

Loading