Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow choosing specific regions to measure via instrumentation API #2983

Merged
merged 3 commits into from
Jul 15, 2024

Conversation

pramodk
Copy link
Member

@pramodk pramodk commented Jul 14, 2024

  • We have Caliper based instrumentation which allows to measure different phases like state updates, solver, current update, spike exchange etc. We also record the calls to each MOD file, e.g. state-hh, cur-hh.
  • If we have large number of MOD files and very few channel instances then such measurement could add significant measurement overhead.
  • E.g. In case of LIKWID, we see:
# without profiling
./x86_64/special -python test.py
NEURON RUN with 1 threads took 0.649262

# with likwid profiling
likwid-perfctr -C 0 -m -g FLOPS_DP x86_64/special -python test.py
NEURON RUN with 1 threads took 10.734261

i.e. 0.6 sec vs 10.73 sec
  • In such case, we want to selectively instrument a specific region. For example, the main psolve region or a specific mod file block like state-hh. This is now possible with a environmental variable NRN_PROFILE_REGIONS:
export NRN_PROFILE_REGIONS=psolve,state-hh
likwid-perfctr -C 0 -m -g FLOPS_DP x86_64/special -python test.py

....
NEURON RUN with 1 threads took 0.836292
...

Region psolve, Group 1: FLOPS_DP
+-------------------+------------+
|    Region Info    | HWThread 0 |
+-------------------+------------+
| RDTSC Runtime [s] |   0.835792 |
|     call count    |          1 |
+-------------------+------------+

+------------------------------------------+---------+------------+
|                   Event                  | Counter | HWThread 0 |
+------------------------------------------+---------+------------+
|             INSTR_RETIRED_ANY            |  FIXC0  | 5336280000 |
|           CPU_CLK_UNHALTED_CORE          |  FIXC1  | 1987773000 |
|           CPU_CLK_UNHALTED_REF           |  FIXC2  | 1523973000 |
| FP_ARITH_INST_RETIRED_128B_PACKED_DOUBLE |   PMC0  |     489202 |
|    FP_ARITH_INST_RETIRED_SCALAR_DOUBLE   |   PMC1  |  729407200 |
| FP_ARITH_INST_RETIRED_256B_PACKED_DOUBLE |   PMC2  |          0 |
| FP_ARITH_INST_RETIRED_512B_PACKED_DOUBLE |   PMC3  |      -     |
+------------------------------------------+---------+------------+
...
Region state-Nav1_6, Group 1: FLOPS_DP
+-------------------+------------+
|    Region Info    | HWThread 0 |
+-------------------+------------+
| RDTSC Runtime [s] |   0.167490 |
|     call count    |        400 |
+-------------------+------------+

+------------------------------------------+---------+------------+
|                   Event                  | Counter | HWThread 0 |
+------------------------------------------+---------+------------+
|             INSTR_RETIRED_ANY            |  FIXC0  | 1341566000 |
|           CPU_CLK_UNHALTED_CORE          |  FIXC1  |  505899600 |
|           CPU_CLK_UNHALTED_REF           |  FIXC2  |  387861900 |
| FP_ARITH_INST_RETIRED_128B_PACKED_DOUBLE |   PMC0  |          0 |
|    FP_ARITH_INST_RETIRED_SCALAR_DOUBLE   |   PMC1  |  224568600 |
| FP_ARITH_INST_RETIRED_256B_PACKED_DOUBLE |   PMC2  |          0 |
| FP_ARITH_INST_RETIRED_512B_PACKED_DOUBLE |   PMC3  |      -     |
+------------------------------------------+---------+------------+

i.e. only specified regions are enabled and with less overhead.

NOTE: I will document this in the developer docs in a separate PR including LIKWID

- We have Caliper based instrumentation which allows to measure different
  phases like state updates, solver, current update, spike exchange etc.
  We also record the calls to each MOD file, e.g. `state-hh`, `cur-hh`.
- If we have large number of MOD files and very few channel instances then
  such measurement could add significant measurement overhead.
- E.g. In case of LIKWID, we see:

```console
./x86_64/special -python test.py
NEURON RUN with 1 threads took 0.649262

likwid-perfctr -C 0 -m -g FLOPS_DP x86_64/special -python test.py
NEURON RUN with 1 threads took 10.734261
```

- In such case, we want to selectively instrument a specific region. For example,
  the main `psolve` region or a specific mod file block like `state-hh`.
  This is now possible with a environmental variable `NRN_PROFILE_REGIONS`:

```console
export NRN_PROFILE_REGIONS=psolve,state-hh
likwid-perfctr -C 0 -m -g FLOPS_DP x86_64/special -python test.py

....
NEURON RUN with 1 threads took 0.836292
...

Region psolve, Group 1: FLOPS_DP
+-------------------+------------+
|    Region Info    | HWThread 0 |
+-------------------+------------+
| RDTSC Runtime [s] |   0.835792 |
|     call count    |          1 |
+-------------------+------------+

+------------------------------------------+---------+------------+
|                   Event                  | Counter | HWThread 0 |
+------------------------------------------+---------+------------+
|             INSTR_RETIRED_ANY            |  FIXC0  | 5336280000 |
|           CPU_CLK_UNHALTED_CORE          |  FIXC1  | 1987773000 |
|           CPU_CLK_UNHALTED_REF           |  FIXC2  | 1523973000 |
| FP_ARITH_INST_RETIRED_128B_PACKED_DOUBLE |   PMC0  |     489202 |
|    FP_ARITH_INST_RETIRED_SCALAR_DOUBLE   |   PMC1  |  729407200 |
| FP_ARITH_INST_RETIRED_256B_PACKED_DOUBLE |   PMC2  |          0 |
| FP_ARITH_INST_RETIRED_512B_PACKED_DOUBLE |   PMC3  |      -     |
+------------------------------------------+---------+------------+
...
Region state-Nav1_6, Group 1: FLOPS_DP
+-------------------+------------+
|    Region Info    | HWThread 0 |
+-------------------+------------+
| RDTSC Runtime [s] |   0.167490 |
|     call count    |        400 |
+-------------------+------------+

+------------------------------------------+---------+------------+
|                   Event                  | Counter | HWThread 0 |
+------------------------------------------+---------+------------+
|             INSTR_RETIRED_ANY            |  FIXC0  | 1341566000 |
|           CPU_CLK_UNHALTED_CORE          |  FIXC1  |  505899600 |
|           CPU_CLK_UNHALTED_REF           |  FIXC2  |  387861900 |
| FP_ARITH_INST_RETIRED_128B_PACKED_DOUBLE |   PMC0  |          0 |
|    FP_ARITH_INST_RETIRED_SCALAR_DOUBLE   |   PMC1  |  224568600 |
| FP_ARITH_INST_RETIRED_256B_PACKED_DOUBLE |   PMC2  |          0 |
| FP_ARITH_INST_RETIRED_512B_PACKED_DOUBLE |   PMC3  |      -     |
+------------------------------------------+---------+------------+
```
i.e. only specified regions are enabled and with less overhead.
@pramodk pramodk force-pushed the pramodk/measurement-api-enh branch from b6c1ac8 to cf1411d Compare July 14, 2024 17:26
@pramodk pramodk force-pushed the pramodk/measurement-api-enh branch from cf1411d to 2478ba8 Compare July 14, 2024 17:28
Copy link

✔️ 2478ba8 -> Azure artifacts URL

Copy link

codecov bot commented Jul 14, 2024

Codecov Report

Attention: Patch coverage is 61.11111% with 7 lines in your changes missing coverage. Please review.

Project coverage is 67.28%. Comparing base (2ace814) to head (6578ba1).

Files Patch % Lines
src/coreneuron/utils/profile/profiler_interface.h 61.11% 7 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #2983      +/-   ##
==========================================
- Coverage   67.29%   67.28%   -0.01%     
==========================================
  Files         572      572              
  Lines      104951   104967      +16     
==========================================
+ Hits        70626    70631       +5     
- Misses      34325    34336      +11     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@bbpbuildbot

This comment has been minimized.

pramodk added a commit that referenced this pull request Jul 14, 2024
- Added insructions about how to build NEURON with LIKWID
  and how to use it with low overhead (see #2983)
- Added some info about performance regression in master
Copy link

sonarcloud bot commented Jul 15, 2024

Copy link

✔️ 6578ba1 -> Azure artifacts URL

@pramodk pramodk enabled auto-merge (squash) July 15, 2024 07:06
Copy link
Collaborator

@1uc 1uc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given you've just made extensive use of this, it should be good to merge.

@pramodk pramodk merged commit df21c81 into master Jul 15, 2024
38 checks passed
@pramodk pramodk deleted the pramodk/measurement-api-enh branch July 15, 2024 08:13
pramodk added a commit that referenced this pull request Jul 16, 2024
Added instructions about how to build NEURON with LIKWID
 and how to use it with low overhead (see #2983)

Co-authored-by: Luc Grosheintz <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants