Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature request] Perforated --topdown #9

Open
gabriel-rodriguez opened this issue Jan 26, 2023 · 6 comments
Open

[Feature request] Perforated --topdown #9

gabriel-rodriguez opened this issue Jan 26, 2023 · 6 comments

Comments

@gabriel-rodriguez
Copy link

First of all: thank you for developing this. I am mostly used to manually using PAPI to monitor PMCs and the main reason why I don't use Perf is because there's no good way to delimit a region of interest (at least, that I know of).

What I was trying to do now was to apply a --topdown analysis to a region of code. I have looked around in the documentation and I think this is not currently supported by the tool. Would it be simple to add this functionality to the current tool?

@zyedidia
Copy link
Owner

I'm not familiar with top-down analysis, could you give a bit more detail about that? It's unlikely I will implement it, but I would be happy to merge a pull request.

@gabriel-rodriguez
Copy link
Author

Top-down Microarchitectural Analysis (TMA) is a performance analysis methodology developed by Ahmad Yasin at Intel. It's quite simple: you measure a very particular set of counters to determine which of four main bottlenecks your application features: front-end bound (you cannot deliver instructions fast enough to the back-end), back-end bound (classical, either compute-bound or memory-bound), speculation bound (your code keeps mispredicting branches and most of the work you do is misspeculated), or retiring (that's where we want to be, our code cannot be faster because we fully use the capabilities of our CPU).

For a more detailed description, I think the best reference is the original paper in ISPASS2014.

The implementation itself is as simple as measuring a predefined set of PMCs and computing the predefined metrics. Both Intel VTune and Perf have incorporated TMA into their pipelines in the last few years. In VTune you choose "Microarchitecture exploration". In Perf you run perf stat --topdown.

@zyedidia
Copy link
Owner

Ok, seems simple enough. The PMCs are presumably ones that you can already measure with perforator, so you could manually compute the metrics with those results already? Is the main request here to add support for automatically computing the metrics in perforator?

@gabriel-rodriguez
Copy link
Author

The problem is that the exact counters to measure vary on a per-architecture basis. I am not sure whether the underlying implementation of perforator directly calls perf, and in that case the ideal implementation would be to just execute --topdown. Otherwise, it's a bit trickier.

@zyedidia
Copy link
Owner

Do you mean on a per microarchitecture basis? Because Perforators only supports x86-64 anyway. Perforator does not call perf — it directly uses the PMU api exposed by linux, so not as easy as just calling perf topdown unfortunately.

@gabriel-rodriguez
Copy link
Author

Yes, sorry, I meant microarchitecture indeed. There's even a dedicated TMA spreadsheet that details how to compute the relevant counters per uarch, but I'm afraid it looks like a painful feature to maintain. If perforator uses PMU directly it will not be as easy as I had hoped.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants