-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature request] Perforated --topdown #9
Comments
I'm not familiar with top-down analysis, could you give a bit more detail about that? It's unlikely I will implement it, but I would be happy to merge a pull request. |
Top-down Microarchitectural Analysis (TMA) is a performance analysis methodology developed by Ahmad Yasin at Intel. It's quite simple: you measure a very particular set of counters to determine which of four main bottlenecks your application features: front-end bound (you cannot deliver instructions fast enough to the back-end), back-end bound (classical, either compute-bound or memory-bound), speculation bound (your code keeps mispredicting branches and most of the work you do is misspeculated), or retiring (that's where we want to be, our code cannot be faster because we fully use the capabilities of our CPU). For a more detailed description, I think the best reference is the original paper in ISPASS2014. The implementation itself is as simple as measuring a predefined set of PMCs and computing the predefined metrics. Both Intel VTune and Perf have incorporated TMA into their pipelines in the last few years. In VTune you choose "Microarchitecture exploration". In Perf you run perf stat --topdown. |
Ok, seems simple enough. The PMCs are presumably ones that you can already measure with perforator, so you could manually compute the metrics with those results already? Is the main request here to add support for automatically computing the metrics in perforator? |
The problem is that the exact counters to measure vary on a per-architecture basis. I am not sure whether the underlying implementation of perforator directly calls perf, and in that case the ideal implementation would be to just execute --topdown. Otherwise, it's a bit trickier. |
Do you mean on a per microarchitecture basis? Because Perforators only supports x86-64 anyway. Perforator does not call perf — it directly uses the PMU api exposed by linux, so not as easy as just calling perf topdown unfortunately. |
Yes, sorry, I meant microarchitecture indeed. There's even a dedicated TMA spreadsheet that details how to compute the relevant counters per uarch, but I'm afraid it looks like a painful feature to maintain. If perforator uses PMU directly it will not be as easy as I had hoped. |
First of all: thank you for developing this. I am mostly used to manually using PAPI to monitor PMCs and the main reason why I don't use Perf is because there's no good way to delimit a region of interest (at least, that I know of).
What I was trying to do now was to apply a --topdown analysis to a region of code. I have looked around in the documentation and I think this is not currently supported by the tool. Would it be simple to add this functionality to the current tool?
The text was updated successfully, but these errors were encountered: