-
Notifications
You must be signed in to change notification settings - Fork 200
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[PAL/Linux-SGX] Allow to dump current SGX/perf stats on a signal #1711
Comments
Can we eliminate the startup and end event noise in all cases? this would help in perf analysis of PyTorch like applications which doesn't run foreever, either reset or dump. |
But how can you do it? You need to know the "start point without the noise" -- how do you automatically determine this start point? I don't think it's possible without hints from the application. |
yes application need to inform gramine when it want stats/perf records to be collected. Is there a way application running inside Gramine can send some signal to Gramine? |
One suggestion, we print enclave enter/exits data with Gramine |
The app can write to a new pseudo-file under If you mean UNIX signals (like SIGINT), then no, the app cannot send such signals to Gramine.
I don't think it's possible. The EENTER/EEXIT statistics is collected at the level of Linux-SGX PAL, but the syscall statistics is collected at the level of LibOS. These are just different layers, and I don't see a simple way to show them together. |
This issue will be completely fixed with two PRs: |
Description of the feature
Currently, Gramine-SGX has two perf analysis tools:
perf record
Both these tools have a limitation: they start collecting stats when Gramine-SGX starts and end collecting stats when Gramine-SGX terminates.
This limits the ability to analyze performance of long-living applications. For example, if MySQL runs under Gramine-SGX, then we may want to analyze only the stats during "hot runs", when a particular client with a particular workload connects to the MySQL server. But because of the current limitation, we will have a lot of noise because stats also contain the startup events, the termination events, and other non-relevant events (like clients that pre-populate the database).
Proposal 1: dump stats on a signal
We choose a signal that serves as a hint to Gramine to dump the currently collected statistics, e.g.
SIGUSR1
. For simplicity, we block this signal on all threads of the process bar the main thread (soSIGUSR1
is guaranteed to always lend in Thread 1).When the signal arrives, we dump SGX stats similar to this:
gramine/pal/src/host/linux-sgx/host_ocalls.c
Lines 33 to 44 in 1f72aaf
NOTE: We will need to dump stats on all currently executing threads, and this will require finding a way to iterate through all threads and summing up their stats. Should be doable.
The output can be like this:
Now the MySQL example can be done like this:
Now we have two sets of stats, collected at steps 4 and 6. We subtract 4-stats from 6-stats (we can subtract only process-wide stats), and we get the statistic on SGX events during the client workload -- exactly what we wanted.
Proposal 2: reset stats on a signal
Same as Proposal 1, but set all stats to zero. This will be easier for end user to read, but Proposal 1 (which requires the "differential" analysis of stats) seems more flexible and easier to implement.
I am in favor of Proposal 1.
What about
perf record
stats?Perf record style (advanced) stats are much more complicated, see:
Here the problem is that we create a
perf.data
file, initialize it with some header, and add events one by one into it. So it's unclear what we can do when a SIGUSR1 signal arrives -- can we seal the current file and start a new file? This adheres to Proposal 2. I don't know how to make it work with Proposal 1...Someone needs to learn how this can be achieved --
perf record
surely allows such things, so it must be accounted for in perf internal formats.One can start with the simpler SGX stats though, and leave the
perf record
stats for later implementation.Why Gramine should implement it?
Useful for perf analysis.
The text was updated successfully, but these errors were encountered: