-
Notifications
You must be signed in to change notification settings - Fork 230
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FeatureRequest] Add TMA_BACKEND, TMA_BE_MEMORY and TMA_BE_CORE counter group to likwid-perfctr #466
Comments
I understand your request but it is tricky in detail. The problem with the TMA groups is that they might require more events than physical counter registers. Perf and VTune apply multiplexing by frequently rescheduling the events on the available counters. Both "drivers" are in kernel-space which can directly access the counters. LIKWID has a different focus by using the physical counters as basis, so you cannot program more events than counters. The TMA Level 1 ( If the TMA level can be measured with the available counters, you can create the performance groups you need yourself: https://github.com/RRZE-HPC/likwid/wiki/likwid-perfctr#defining-custom-performance-groups . If a level requires multiple measurements, you can try to split the level metrics into multiple groups. It seems you want to use the MarkerAPI. There you can use Map file for TMA Level to events for a specific architecture: https://download.01.org/perfmon/TMA_Metrics.xlsx (there are also CSV variants). |
I investigated this a bit, and it turns out to implement TMA_BACKEND, TMA_BE_MEMORY and TMA_BE_CORE, we would not about 8 counters, so it is not possible to do it now. Why is there a limit to ACCESSMODE=perf_event. Perf_event can use multiplexing to record more than four registers, but you disabled it for some reason. Why? |
Short story: Long story: I totally understand that there are features in the |
Is your feature request related to a problem? Please describe.
I am frustrated when doing top-down analysis with Intel's VTUNE. The tool us cumbersome and affects the results. Also I cannot limit to the code I am interested in, only whole functions LIKWID already has TMA counter group, but it should move further with additional groups which are deep
Describe the solution you'd like
I would like to be able to do the same analysis with LIKWID. For the beginning, I would like two additional groups: TMA_BE_MEMORY and TMA_BE_CORE. Here is the possible output:
TMA_BE_MEMORY
TMA_BE_CORE
Additional context
You will probably need this to implement it:
This performance group measures cycles to determine percentage of time spent in
front end, back end, retiring and speculation. These metrics are published and
verified by Intel. Further information:
Webpage describing Top-Down Method and its usage in Intel vTune:
https://software.intel.com/en-us/vtune-amplifier-help-tuning-applications-using-a-top-down-microarchitecture-analysis-method
Paper by Yasin Ahmad:
https://sites.google.com/site/analysismethods/yasin-pubs/TopDown-Yasin-ISPASS14.pdf?attredirects=0
Slides by Yasin Ahmad:
http://www.cs.technion.ac.il/~erangi/TMA_using_Linux_perf__Ahmad_Yasin.pdf
The Intel Icelake microarchitecture provides a distinct register for the Top-Down Method metrics.
The text was updated successfully, but these errors were encountered: