Find a better way to outline external code than frame counting. #818

vext01 · 2023-08-22T09:17:18Z

Currently when the trace compiler hits a call to external code, it disassembles the raw instruction stream counting the depth of the call stack as it goes. Normal trace compilation resumes when we get back to the same stack depth as when we started.

This will fail if the instruction stream does stack gymnastics (e.g. emits not a call but a push retaddr; jmp func).

We need to find a better way. Ideally we'd periodically (after all control flow dispatches?) look at the value of the stack pointer to decide if we should stop outlining, but this will slow us down.

Last time I looked PT has a way to record the dynamic value of the stack pointer, but you can't control when such packets are emitted, and we could be notified too late.

(If we can solve this, I think we can remove the disassembler from the PT decoder)

The text was updated successfully, but these errors were encountered:

vext01 · 2023-08-22T09:50:53Z

Just reading up on this again.

The facility I mentioned above is "PEBS output to PT". By the looks of it, PEBS is a mechanism for raising events when certain performance counters overflow. You can have those events (in some CPUs) written to PT packets which record the stack pointer near the time of the overflow.

Unless I've overlooked something, this isn't sufficient for our use case.

ltratt · 2023-09-18T11:42:46Z

As discussed this morning offline, the problem here is when we call to unmappable code and it calls back into mappable code: we have to be sure that we've got back to mappable code without any unmappable code being "left on the callstack" as it were.

One way of tackling this is to realise that we can in most cases statically determine the successor blocks that can be seen after a call. So for example, if we saw a function call in block B, we know that the next blocks we can see are (say) C and D. [For C and D we'd also have to work out which blocks can come before them for the next check to be correct, but I'll try and keep things simple for this comment!] If we see C and D in the stream but haven't seen B, we know that we've got back to the "right point" and can continue. However, if we see B again, before we've seen C or D, there is still unmappable code on the callstack and we're recursing.

There are some challenges: for example, if a function call is the very first or very last thing in a function we won't have predecessor/successor blocks that we can statically determine. My inclination here would be to mark such functions as do_not_trace and/or say that if we encounter them during tracing that we abort the trace. Such cases probably don't happen very often -- we should easily be able to measure how often and then determine whether it's worth having clever support for them or not. [Technically the predecessor/successor analysis could be whole-program, though I imagine coding that up would be... fun!]

vext01 added the soundness label Aug 22, 2023

vext01 self-assigned this Aug 22, 2023

vext01 mentioned this issue Nov 2, 2023

Zero length call fixes for our PT decoder. #891

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Find a better way to outline external code than frame counting. #818

Find a better way to outline external code than frame counting. #818

vext01 commented Aug 22, 2023

vext01 commented Aug 22, 2023

ltratt commented Sep 18, 2023

Find a better way to outline external code than frame counting. #818

Find a better way to outline external code than frame counting. #818

Comments

vext01 commented Aug 22, 2023

vext01 commented Aug 22, 2023

ltratt commented Sep 18, 2023