Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error with batched einsum array context #967

Open
majosm opened this issue Sep 6, 2023 · 5 comments
Open

Error with batched einsum array context #967

majosm opened this issue Sep 6, 2023 · 5 comments

Comments

@majosm
Copy link
Collaborator

majosm commented Sep 6, 2023

When using the batched einsum array context with the prediction driver, the RHS compile produces the following error:

loopy.diagnostic.LoopyIndexError: 'inv_metric_deriv_v_wall[iambient_dim, itopo_dim, iel_47_inner_inner + iel_47_inner_outer*4 + iel_47_outer*1280, 0]' in instruction '_pt_temp_2_store_itopo_dim_idof_74_update' accesses out-of-bounds array element (could not establish '{ [i0, i1, i2, 0] : 0 <= i0 <= 1 and 0 <= i1 <= 1 and 0 <= i2 <= 12390 }' is a subset of '{ [i0, i1, i2, i3] : i3 = 0 and 0 <= i0 <= 1 and 0 <= i1 <= 1 and 0 <= i2 <= 581 }').

When run without -O, it produces a different (but possibly related) error:

loopy.diagnostic.LoopyError: inames _pt_temp_335_dim0 and iel_47 do not iterate over the same domain

The error persists with most of the physics turned off, as long as species limiting and the main isothermal boundary both remain enabled (note: the actual boundary condition being applied doesn't seem to matter, I've tried both isothermal and DummyBoundary).

A reduced Y3 case can be installed/run with the instructions below. It creates a RHS DAG of about 100 nodes and runs in a few minutes.

git clone [email protected]:illinois-ceesd/drivers_y3-prediction.git
cd drivers_y3-prediction
git checkout batched-einsum-error-reproducer
./buildMirge.sh --use-ssh
source emirge/config/activate_env.sh
cd smoke_test_ks
python -m mpi4py driver.py -i run_params.yaml --lazy --log
@majosm
Copy link
Collaborator Author

majosm commented Sep 6, 2023

Forgot to mention: when I look at the two loops mentioned in the error, their lengths correspond to the numbers of elements in the interior faces and on the isothermal boundary.

@majosm
Copy link
Collaborator Author

majosm commented Sep 7, 2023

I applied @kaushikcfd's fix from inducer/arraycontext#217 and now I can run the full KS 2D case without errors. 🎉

Seeing that PR made me question whether I'm using the right version of the code, though. In my current subpackage config, apply_kennedy_fusion_with_batched_einsum_extension is in meshmode. Is there a different config I should be using that uses the version that's in arraycontext?

@inducer
Copy link
Contributor

inducer commented Sep 7, 2023

Yay! How are compile times with this transform path?

@inducer
Copy link
Contributor

inducer commented Sep 7, 2023

Seeing that PR made me question whether I'm using the right version of the code, though. In my current subpackage config, apply_kennedy_fusion_with_batched_einsum_extension is in meshmode. Is there a different config I should be using that uses the version that's in arraycontext?

Using what's in Kaushik's meshmode branch is the correct approach. As Kaushik says here, he has just "parked" the code in meshmode while it's under review. The code is technically indepenent of meshmode and thus will land in arraycontext, that's why the PRs live there.

@majosm
Copy link
Collaborator Author

majosm commented Sep 7, 2023

Yay! How are compile times with this transform path?

Let's say there's room for improvement, heh.

Fusion contractor:

    Run time :                                   894 sec.

Batched einsum:

    Run time :                                   3851 sec.

Unfortunately it looks like the timestep time is also quite a bit slower at the moment.

Fusion contractor:

 Performance:
    walltime: 0.264042 s
    visualization time:      0 s
    garbage collection time:      0 s
    log walltime: 4.29102e-05 s
 Memory:
    python memory: 2580.69 Mb
    gpu memory: 1481.69 Mb
    memory hwm: 2751.62 Mb
    mempool total: 971.216 Mb
    mempool active: 147.447 Mb

Batched einsum:

Performance:
   walltime: 1.5777 s
   visualization time:      0 s
   garbage collection time:      0 s
   log walltime: 4.19766e-05 s
Memory:
   python memory: 1733.69 Mb
   gpu memory: 1073.62 Mb
   memory hwm: 2947.81 Mb
   mempool total: 676.924 Mb
   mempool active:  194.55 Mb

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants