[PAL/Linux-SGX] Allow to dump current SGX/perf stats on a signal #1711

dimakuv · 2024-01-10T10:05:11Z

Description of the feature

Currently, Gramine-SGX has two perf analysis tools:

Trivial stats on SGX events (EENTER, EEXIT, etc.)
Advanced stats similar to perf record

Both these tools have a limitation: they start collecting stats when Gramine-SGX starts and end collecting stats when Gramine-SGX terminates.

This limits the ability to analyze performance of long-living applications. For example, if MySQL runs under Gramine-SGX, then we may want to analyze only the stats during "hot runs", when a particular client with a particular workload connects to the MySQL server. But because of the current limitation, we will have a lot of noise because stats also contain the startup events, the termination events, and other non-relevant events (like clients that pre-populate the database).

Proposal 1: dump stats on a signal

We choose a signal that serves as a hint to Gramine to dump the currently collected statistics, e.g. SIGUSR1. For simplicity, we block this signal on all threads of the process bar the main thread (so SIGUSR1 is guaranteed to always lend in Thread 1).

When the signal arrives, we dump SGX stats similar to this:

gramine/pal/src/host/linux-sgx/host_ocalls.c

Lines 33 to 44 in 1f72aaf

    
           static long sgx_ocall_exit(void* args) { 
        
               struct ocall_exit* ocall_exit_args = args; 
        
               if (ocall_exit_args->exitcode != (int)((uint8_t)ocall_exit_args->exitcode)) { 
        
                   log_debug("Saturation error in exit code %d getting rounded down to %u", 
        
                             ocall_exit_args->exitcode, (uint8_t)ocall_exit_args->exitcode); 
        
                   ocall_exit_args->exitcode = 255; 
        
               } 
        
               /* exit the whole process if exit_group() */ 
        
               if (ocall_exit_args->is_exitgroup) { 
        
                   update_and_print_stats(/*process_wide=*/true);

NOTE: We will need to dump stats on all currently executing threads, and this will require finding a way to iterate through all threads and summing up their stats. Should be doable.

The output can be like this:

   static uint32_t g_user_signal_number = 0;
   log_always("----- SGX stats for process %d (on user signal %u) -----\n"
                   "  # of EENTERs:        %lu\n", ... g_user_signal_number++);
   ...

Now the MySQL example can be done like this:

Start MySQL in Gramine
Do/wait for the initialization to finish
Right-before starting the workload client, send SIGUSR1 to Gramine
Gramine dumps the current stats
Right-after ending the workload client, send SIGUSR1 to Gramine
Gramine dumps the current stats
Finish the run

Now we have two sets of stats, collected at steps 4 and 6. We subtract 4-stats from 6-stats (we can subtract only process-wide stats), and we get the statistic on SGX events during the client workload -- exactly what we wanted.

Proposal 2: reset stats on a signal

Same as Proposal 1, but set all stats to zero. This will be easier for end user to read, but Proposal 1 (which requires the "differential" analysis of stats) seems more flexible and easier to implement.

I am in favor of Proposal 1.

What about `perf record` stats?

Perf record style (advanced) stats are much more complicated, see:

Here the problem is that we create a perf.data file, initialize it with some header, and add events one by one into it. So it's unclear what we can do when a SIGUSR1 signal arrives -- can we seal the current file and start a new file? This adheres to Proposal 2. I don't know how to make it work with Proposal 1...

Someone needs to learn how this can be achieved -- perf record surely allows such things, so it must be accounted for in perf internal formats.

One can start with the simpler SGX stats though, and leave the perf record stats for later implementation.

Why Gramine should implement it?

Useful for perf analysis.

The text was updated successfully, but these errors were encountered:

jkr0103 · 2024-01-10T11:14:17Z

But because of the current limitation, we will have a lot of noise because stats also contain the startup events, the termination events, and other non-relevant events

Can we eliminate the startup and end event noise in all cases? this would help in perf analysis of PyTorch like applications which doesn't run foreever, either reset or dump.

dimakuv · 2024-01-10T13:31:08Z

Can we eliminate the startup and end event noise in all cases? this would help in perf analysis of PyTorch like applications which doesn't run foreever, either reset or dump.

But how can you do it? You need to know the "start point without the noise" -- how do you automatically determine this start point? I don't think it's possible without hints from the application.

jkr0103 · 2024-01-17T10:56:32Z

yes application need to inform gramine when it want stats/perf records to be collected. Is there a way application running inside Gramine can send some signal to Gramine?

jkr0103 · 2024-01-17T10:59:50Z

One suggestion, we print enclave enter/exits data with Gramine stats but not the count of syscalls which have caused the enclave enter/exits. We can collect the count of each syscall which caused enclave enter/exit and print with the stats.

dimakuv · 2024-01-17T14:36:29Z

yes application need to inform gramine when it want stats/perf records to be collected. Is there a way application running inside Gramine can send some signal to Gramine?

The app can write to a new pseudo-file under /dev/. However, I'm a bit wary of adding more Gramine-specific APIs without (1) a good reason, and (2) a good design/naming proposal.

If you mean UNIX signals (like SIGINT), then no, the app cannot send such signals to Gramine.

One suggestion, we print enclave enter/exits data with Gramine stats but not the count of syscalls which have caused the enclave enter/exits. We can collect the count of each syscall which caused enclave enter/exit and print with the stats.

I don't think it's possible. The EENTER/EEXIT statistics is collected at the level of Linux-SGX PAL, but the syscall statistics is collected at the level of LibOS. These are just different layers, and I don't see a simple way to show them together.

dimakuv · 2024-09-25T07:52:52Z

This issue will be completely fixed with two PRs:

jkr0103 mentioned this issue Feb 5, 2024

Flush profile data on demand using SIGUSR1 signal #1751

Merged

dimakuv mentioned this issue Sep 25, 2024

[PAL/Linux-SGX] Print SGX stats on SIGUSR1 and reset them #1996

Merged

dimakuv added feature request P: 1 labels Sep 25, 2024

dimakuv closed this as completed in #1996 Sep 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[PAL/Linux-SGX] Allow to dump current SGX/perf stats on a signal #1711

[PAL/Linux-SGX] Allow to dump current SGX/perf stats on a signal #1711

dimakuv commented Jan 10, 2024

jkr0103 commented Jan 10, 2024

dimakuv commented Jan 10, 2024

jkr0103 commented Jan 17, 2024

jkr0103 commented Jan 17, 2024

dimakuv commented Jan 17, 2024

dimakuv commented Sep 25, 2024

[PAL/Linux-SGX] Allow to dump current SGX/perf stats on a signal #1711

[PAL/Linux-SGX] Allow to dump current SGX/perf stats on a signal #1711

Comments

dimakuv commented Jan 10, 2024

Description of the feature

Proposal 1: dump stats on a signal

Proposal 2: reset stats on a signal

What about perf record stats?

Why Gramine should implement it?

jkr0103 commented Jan 10, 2024

dimakuv commented Jan 10, 2024

jkr0103 commented Jan 17, 2024

jkr0103 commented Jan 17, 2024

dimakuv commented Jan 17, 2024

dimakuv commented Sep 25, 2024

What about `perf record` stats?