Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Find a better way to outline external code than frame counting. #818

Open
vext01 opened this issue Aug 22, 2023 · 2 comments
Open

Find a better way to outline external code than frame counting. #818

vext01 opened this issue Aug 22, 2023 · 2 comments
Assignees

Comments

@vext01
Copy link
Contributor

vext01 commented Aug 22, 2023

Currently when the trace compiler hits a call to external code, it disassembles the raw instruction stream counting the depth of the call stack as it goes. Normal trace compilation resumes when we get back to the same stack depth as when we started.

This will fail if the instruction stream does stack gymnastics (e.g. emits not a call but a push retaddr; jmp func).

We need to find a better way. Ideally we'd periodically (after all control flow dispatches?) look at the value of the stack pointer to decide if we should stop outlining, but this will slow us down.

Last time I looked PT has a way to record the dynamic value of the stack pointer, but you can't control when such packets are emitted, and we could be notified too late.

(If we can solve this, I think we can remove the disassembler from the PT decoder)

@vext01 vext01 self-assigned this Aug 22, 2023
@vext01
Copy link
Contributor Author

vext01 commented Aug 22, 2023

Just reading up on this again.

The facility I mentioned above is "PEBS output to PT". By the looks of it, PEBS is a mechanism for raising events when certain performance counters overflow. You can have those events (in some CPUs) written to PT packets which record the stack pointer near the time of the overflow.

Unless I've overlooked something, this isn't sufficient for our use case.

@ltratt
Copy link
Contributor

ltratt commented Sep 18, 2023

As discussed this morning offline, the problem here is when we call to unmappable code and it calls back into mappable code: we have to be sure that we've got back to mappable code without any unmappable code being "left on the callstack" as it were.

One way of tackling this is to realise that we can in most cases statically determine the successor blocks that can be seen after a call. So for example, if we saw a function call in block B, we know that the next blocks we can see are (say) C and D. [For C and D we'd also have to work out which blocks can come before them for the next check to be correct, but I'll try and keep things simple for this comment!] If we see C and D in the stream but haven't seen B, we know that we've got back to the "right point" and can continue. However, if we see B again, before we've seen C or D, there is still unmappable code on the callstack and we're recursing.

There are some challenges: for example, if a function call is the very first or very last thing in a function we won't have predecessor/successor blocks that we can statically determine. My inclination here would be to mark such functions as do_not_trace and/or say that if we encounter them during tracing that we abort the trace. Such cases probably don't happen very often -- we should easily be able to measure how often and then determine whether it's worth having clever support for them or not. [Technically the predecessor/successor analysis could be whole-program, though I imagine coding that up would be... fun!]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants