Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Unify divb analyze routines for equations with magnetic field #1712

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

amrueda
Copy link
Contributor

@amrueda amrueda commented Nov 7, 2023

No description provided.

@amrueda amrueda marked this pull request as draft November 7, 2023 11:33
Copy link
Contributor

github-actions bot commented Nov 7, 2023

Review checklist

This checklist is meant to assist creators of PRs (to let them know what reviewers will typically look for) and reviewers (to guide them in a structured review process). Items do not need to be checked explicitly for a PR to be eligible for merging.

Purpose and scope

  • The PR has a single goal that is clear from the PR title and/or description.
  • All code changes represent a single set of modifications that logically belong together.
  • No more than 500 lines of code are changed or there is no obvious way to split the PR into multiple PRs.

Code quality

  • The code can be understood easily.
  • Newly introduced names for variables etc. are self-descriptive and consistent with existing naming conventions.
  • There are no redundancies that can be removed by simple modularization/refactoring.
  • There are no leftover debug statements or commented code sections.
  • The code adheres to our conventions and style guide, and to the Julia guidelines.

Documentation

  • New functions and types are documented with a docstring or top-level comment.
  • Relevant publications are referenced in docstrings (see example for formatting).
  • Inline comments are used to document longer or unusual code sections.
  • Comments describe intent ("why?") and not just functionality ("what?").
  • If the PR introduces a significant change or new feature, it is documented in NEWS.md.

Testing

  • The PR passes all tests.
  • New or modified lines of code are covered by tests.
  • New or modified tests run in less then 10 seconds.

Performance

  • There are no type instabilities or memory allocations in performance-critical parts.
  • If the PR intent is to improve performance, before/after time measurements are posted in the PR.

Verification

  • The correctness of the code was verified using appropriate tests.
  • If new equations/methods are added, a convergence test has been run and the results
    are posted in the PR.

Created with ❤️ by the Trixi.jl community.

Copy link

codecov bot commented Nov 7, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (276dc3c) 74.61% compared to head (9e8dd98) 82.99%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1712      +/-   ##
==========================================
+ Coverage   74.61%   82.99%   +8.38%     
==========================================
  Files         431      431              
  Lines       34600    34654      +54     
==========================================
+ Hits        25815    28761    +2946     
+ Misses       8785     5893    -2892     
Flag Coverage Δ
unittests 82.99% <100.00%> (+8.38%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
src/callbacks_step/analysis_dg2d.jl 95.57% <100.00%> (-0.36%) ⬇️
src/equations/ideal_glm_mhd_2d.jl 98.80% <100.00%> (+2.75%) ⬆️
src/equations/ideal_glm_mhd_multicomponent_2d.jl 100.00% <100.00%> (ø)

... and 107 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@sloede sloede left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some small suggestions, but otherwise this looks like a good approach for unifying these analysis quantities! You should, however, also ping Andrew or Hendrik once you're ready for a final look

src/equations/ideal_glm_mhd_multicomponent_2d.jl Outdated Show resolved Hide resolved
Comment on lines +255 to +256
B1_kj, _, _ = magnetic_field(view(u, :, k, j, element), equations)
_, B2_ik, _ = magnetic_field(view(u, :, i, k, element), equations)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
B1_kj, _, _ = magnetic_field(view(u, :, k, j, element), equations)
_, B2_ik, _ = magnetic_field(view(u, :, i, k, element), equations)
B1_kj, _, _ = magnetic_field(get_node_vars(u, equations, dg, k, j, element), equations)
_, B2_ik, _ = magnetic_field(get_node_vars(u, equations, dg, i, k, element), equations)

To avoid the ugly view? If this works, it should also be applied to the other places in this PR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It works for a short simulation, but it seems that it creates a lot of allocations. I tested with the following:

julia> using Trixi
julia> using BenchmarkTools
julia> include("../examples/tree_2d_dgsem/elixir_mhd_ec.jl")
julia> @benchmark Trixi.analyze($Val(:l2_divb), $ode.u0, $ode.u0, 0.0, $mesh, $equations, $solver, $semi.cache)

I got the following for the implementation with view on Rocinante:

BenchmarkTools.Trial: 4272 samples with 1 evaluation.
 Range (min … max):  661.995 μs … 9.138 ms  ┊ GC (min … max):  0.00% … 88.29%
 Time  (median):     898.936 μs             ┊ GC (median):     0.00%
 Time  (mean ± σ):     1.164 ms ± 1.153 ms  ┊ GC (mean ± σ):  20.53% ± 17.92%

  █▆▆▆▅▅▃▁                                                    ▁
  █████████▅▃▅▅▃▁▁▁▁▁▆▃▇▁▁▁▁▁▆▇▇▇▆▁▁▄▄▃▃▃▃▄▅▁▄▄▃▃▄▄▅▅▅▇▄▆▆▆▆▆ █
  662 μs       Histogram: log(frequency) by time      7.53 ms <

 Memory estimate: 2.13 MiB, allocs estimate: 28685.

But the implementation with get_node_vars crashes with the following (I removed many lines in the middle of the error message for readability):

[644396] signal (11.1): Segmentation fault
in expression starting at REPL[4]:1
getindex at ./essentials.jl:14 [inlined]
#309 at /mnt/hd1/scratch/aruedara/20231107_divB/Trixi.jl/src/solvers/dg.jl:542 [inlined]
macro expansion at ./ntuple.jl:72 [inlined]
ntuple at ./ntuple.jl:69 [inlined]
get_node_vars at /mnt/hd1/scratch/aruedara/20231107_divB/Trixi.jl/src/solvers/dg.jl:542 [inlined]
.
.
.
jl_apply at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/julia.h:1880 [inlined]
true_main at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/jlapi.c:573
jl_repl_entrypoint at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/jlapi.c:717
main at julia (unknown line)
__libc_start_main at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
unknown function (ip: 0x4010b8)
Allocations: 83483411 (Pool: 83454153; Big: 29258); GC: 106
Segmentation fault (core dumped)

Is this test OK? should I keep the implementation with view?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I think there seems to be an underlying issue that I just do not understand.

Can you try to separate the get_node_vars call from the magnetic_field call? That is, to use something like

Suggested change
B1_kj, _, _ = magnetic_field(view(u, :, k, j, element), equations)
_, B2_ik, _ = magnetic_field(view(u, :, i, k, element), equations)
u_local_kj = get_node_vars(u, equations, dg, k, j, element)
u_local_ik = get_node_vars(u, equations, dg, i, k, element)
B1_kj, _, _ = magnetic_field(u_local_kj, equations)
_, B2_ik, _ = magnetic_field(u_local_ik, equations)

such that we know where exactly the issue comes from?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That works 🙂 :

BenchmarkTools.Trial: 3875 samples with 1 evaluation.
 Range (min … max):  685.359 μs … 9.795 ms  ┊ GC (min … max):  0.00% … 83.22%
 Time  (median):     992.257 μs             ┊ GC (median):     0.00%
 Time  (mean ± σ):     1.284 ms ± 1.303 ms  ┊ GC (mean ± σ):  20.88% ± 17.77%

  █▅▆▆▆▅▄▂▁                                                   ▁
  █████████▆▆▃▃▁▁▁▁▁▁▁▆▅▆▁▁▁▁▁▄▆▇▅▅▃▃▃▅▃▆▅▅▅▁▁▁▃▅▃▃▅▄▇▆▇▆▆▆▄▆ █
  685 μs       Histogram: log(frequency) by time       8.2 ms <

 Memory estimate: 2.13 MiB, allocs estimate: 28685.

Is this preferred to the view implementation?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this preferred to the view implementation?

Yes. We introduced the get_node_vars function and friends specifically for the purpose of not having to use views (which used to allocate) and also to hide memory layout details from the application.

@ranocha out of curiosity, do you have an explanation why my original formulation caused a segfault?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You called it with u_ode as argument but the function is written to accept u? But I would have expected that both versions lead to the same problem

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You called it with u_ode as argument but the function is written to accept u?

No, I called it exactly as detailed above, with ode.u0 for both u and du (du is not used in the function).

I'm getting a weird behavior... Yesterday, when I tried the first code suggestion of Michael, I got the error message twice in a row. Today, I decided to give it a go again, and the benchmark is working fine (now 12 times in a row). Maybe the problem yesterday was that someone else was running a simulation on Rocinante(?).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's what I'm saying - ode.u0 is a vector, we accept the wrapped array

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you run it with any --check-bounds flags?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, ok! I didn't realize that... No, I ran it with --check-bounds=no. Without that flag it crashes with

BoundsError: attempt to access 36864-element Vector{Float64} at index [1, 2, 1, 1]

I corrected my testing procedure and now I'm using:

julia> using Trixi
julia> using BenchmarkTools
julia> include("../examples/tree_2d_dgsem/elixir_mhd_ec.jl")
julia> u = Trixi.wrap_array(ode.u0, mesh, equations, solver, semi.cache)
julia> @benchmark Trixi.analyze($Val(:l2_divb), $u, $u, 0.0, $mesh, $equations, $solver, $semi.cache)

This seems to work fine using the first suggestion by @sloede (checking bounds):

BenchmarkTools.Trial: 5838 samples with 1 evaluation.
 Range (min  max):  700.557 μs    3.900 ms  ┊ GC (min  max):  0.00%  69.72%
 Time  (median):     724.738 μs               ┊ GC (median):     0.00%
 Time  (mean ± σ):   853.904 μs ± 585.713 μs  ┊ GC (mean ± σ):  14.73% ± 16.32%

  █▂                                                          ▃ ▁
  ██▃▃▁▁▁▁▆▄▅▁▁▁▁▁▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▇█ █
  701 μs        Histogram: log(frequency) by time       3.55 ms <

 Memory estimate: 2.31 MiB, allocs estimate: 32785.

For reference, this is the original implementation in Trixi:

BenchmarkTools.Trial: 5729 samples with 1 evaluation.
 Range (min  max):  711.988 μs    3.850 ms  ┊ GC (min  max):  0.00%  71.50%
 Time  (median):     738.548 μs               ┊ GC (median):     0.00%
 Time  (mean ± σ):   870.079 μs ± 592.862 μs  ┊ GC (mean ± σ):  14.63% ± 16.28%

  █▃                                                          ▃ ▁
  ██▄▃▁▁▆▆▄▁▁▁▁▁▁▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▄█ █
  712 μs        Histogram: log(frequency) by time        3.6 ms <

 Memory estimate: 2.31 MiB, allocs estimate: 32785.

So it seems that it works fine with get_node_vars.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants