Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PAL/Linux-SGX] Print SGX stats on SIGUSR1 and reset them #1996

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

dimakuv
Copy link
Contributor

@dimakuv dimakuv commented Sep 4, 2024

Description of the changes

This commit adds support to dump and reset SGX-related statistics interactively, using SIGUSR1 signal. This helps to collect SGX-related statistics only for a particular period, e.g. skipping the Gramine startup and application initialization time and concentrating only on the actual application processing.

The printed-out statistics are not precise, as the "stats collecting" thread may run in parallel with other threads that update the statistics. However, this imprecise implementation is simple and enough for perf analysis.

This commit also breaks compatibility: SGX statistics can only be collected and printed when Gramine is built in debug or debugoptimized mode. However, this should not affect users as SGX stats is tailored for manual debugging and profiling sessions.

Applied on top of #1995

Closes #1857

Fixes #1711

How to test this PR?

Check instructions in #1857


This change is Reviewable

Copy link
Contributor Author

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 14 files reviewed, 2 unresolved discussions, not enough approvals from maintainers (2 more required), not enough approvals from different teams (1 more required, approved so far: Intel)


pal/src/host/linux-sgx/host_exception.c line 194 at r1 (raw file):

    if (g_sgx_enable_stats) {
        PAL_HOST_TCB* tcb = pal_get_host_tcb();
        __atomic_store_n(&tcb->reset_stats, true, __ATOMIC_RELAXED);

It is possible that two SIGUSR1 signals arrive one after another (or even on different threads). This is benign because the whole collect_and_print_sgx_stats() function is protected with a global lock.


pal/src/host/linux-sgx/host_thread.c line 19 at r1 (raw file):

    unsigned int    tid;
    sgx_arch_tcs_t* tcs;
    PAL_HOST_TCB*   tcb;

This is a bit controversial: a "SGX-stats-collecting" thread (that received SIGUSR1) is able to peek into the thread local storage of other threads. This is typically considered a bad smell (one thread reads local data of another thread), but here I think it's a reasonable trade-off: most of the accesses to the thread-local SGX counters are done on EEXIT and AEX events by the thread itself, so the memory access should be as simple as possible and as fast as possible. If we would move everything in a shared array, then we would have false cache sharing and complex memory management of the shared array.

Copy link
Member

@mkow mkow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 14 of 14 files at r1, all commit messages.
Reviewable status: all files reviewed, 6 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel) (waiting on @dimakuv)


-- commits line 15 at r1:

Suggestion:

SGX statistics can now only be

pal/src/host/linux-sgx/host_exception.c line 285 at r1 (raw file):

    PAL_HOST_TCB* tcb = pal_get_host_tcb();
    if (__atomic_load_n(&tcb->reset_stats, __ATOMIC_RELAXED) == false)

Maybe just atomic exchange with false? Would be one line less and effectively work the same.


pal/src/host/linux-sgx/host_thread.c line 19 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

This is a bit controversial: a "SGX-stats-collecting" thread (that received SIGUSR1) is able to peek into the thread local storage of other threads. This is typically considered a bad smell (one thread reads local data of another thread), but here I think it's a reasonable trade-off: most of the accesses to the thread-local SGX counters are done on EEXIT and AEX events by the thread itself, so the memory access should be as simple as possible and as fast as possible. If we would move everything in a shared array, then we would have false cache sharing and complex memory management of the shared array.

Could you put that in a comment here? Or better, to the loop iterating over TCBs?


pal/src/host/linux-sgx/host_thread.c line 75 at r1 (raw file):

    /* there is a small window when the thread's counters may be updated in-between reading and
     * resetting these counters -- some SGX events will be lost; we ignore this as the number of
     * lost events is negligible for perf analysis purposes */

Why not atomic exchange with 0 then?


pal/src/host/linux-sgx/host_thread.c line 84 at r1 (raw file):

}

/* this function is called only on thread/process exit (never in the middle of thread exec) */

Then I'd add a suffix on_thread_exit to it (if it's important to its semantics).

Copy link
Contributor Author

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 10 of 14 files reviewed, 6 unresolved discussions, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel), "fixup! " found in commit messages' one-liners (waiting on @dimakuv and @mkow)


-- commits line 15 at r1:
Will do during final rebase


pal/src/host/linux-sgx/host_exception.c line 285 at r1 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

Maybe just atomic exchange with false? Would be one line less and effectively work the same.

Done.


pal/src/host/linux-sgx/host_thread.c line 19 at r1 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

Could you put that in a comment here? Or better, to the loop iterating over TCBs?

Done. Put in the loop.


pal/src/host/linux-sgx/host_thread.c line 75 at r1 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

Why not atomic exchange with 0 then?

Done. True, I was being lazy.


pal/src/host/linux-sgx/host_thread.c line 84 at r1 (raw file):

Previously, mkow (Michał Kowalczyk) wrote…

Then I'd add a suffix on_thread_exit to it (if it's important to its semantics).

Done. It's not only thread exit but also process exit, so I decided to add _on_exit.

mkow
mkow previously approved these changes Sep 18, 2024
Copy link
Member

@mkow mkow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 4 of 4 files at r2, all commit messages.
Reviewable status: all files reviewed, 2 unresolved discussions, "fixup! " found in commit messages' one-liners (waiting on @dimakuv)


pal/src/host/linux-sgx/host_thread.c line 75 at r1 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Done. True, I was being lazy.

It's even shorter now :)

Copy link
Contributor

@kailun-qin kailun-qin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 9 of 14 files at r1, 4 of 4 files at r2, all commit messages.
Reviewable status: all files reviewed, 2 unresolved discussions, "fixup! " found in commit messages' one-liners (waiting on @dimakuv)

kailun-qin
kailun-qin previously approved these changes Sep 19, 2024
Copy link
Contributor

@kailun-qin kailun-qin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 14 files at r1.
Reviewable status: all files reviewed, 2 unresolved discussions, "fixup! " found in commit messages' one-liners (waiting on @dimakuv)

@dimakuv dimakuv force-pushed the dimakuv/sgx-stats-rm-thread-local-prints branch from f66ba21 to 2ff0ab1 Compare September 24, 2024 14:15
Base automatically changed from dimakuv/sgx-stats-rm-thread-local-prints to master September 24, 2024 16:54
@dimakuv dimakuv dismissed stale reviews from kailun-qin and mkow September 24, 2024 16:54

The base branch was changed.

@dimakuv dimakuv force-pushed the dimakuv/sgx-reset-stats-on-sigusr1 branch from 664c6e8 to baffd70 Compare September 24, 2024 17:14
Copy link
Contributor Author

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 7 of 14 files reviewed, 1 unresolved discussion, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel) (waiting on @kailun-qin and @mkow)


-- commits line 15 at r1:

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Will do during final rebase

Done.

mkow
mkow previously approved these changes Sep 24, 2024
Copy link
Member

@mkow mkow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 7 of 7 files at r3, all commit messages.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved

Copy link
Contributor Author

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 13 of 14 files reviewed, 1 unresolved discussion, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel) (waiting on @mkow)


pal/src/host/linux-sgx/host_thread.c line 73 at r4 (raw file):

                                              __ATOMIC_RELAXED);
    g_async_signal_cnt += __atomic_exchange_n((uint64_t*)&tcb->async_signal_cnt, 0,
                                              __ATOMIC_RELAXED);

Not happy about the explicit cast, but otherwise Clang complains (GCC doesn't):

  ../pal/src/host/linux-sgx/host_thread.c:67:27: error: address argument to atomic operation must be a pointer to integer or pointer ('atomic_ulong *' (aka '_Atomic(unsigned long) *') invalid)
      g_eenter_cnt       += __atomic_exchange_n(&tcb->eenter_cnt, 0, __ATOMIC_RELAXED);

Copy link
Member

@mkow mkow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r4, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel) (waiting on @dimakuv)


pal/src/host/linux-sgx/host_thread.c line 73 at r4 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Not happy about the explicit cast, but otherwise Clang complains (GCC doesn't):

  ../pal/src/host/linux-sgx/host_thread.c:67:27: error: address argument to atomic operation must be a pointer to integer or pointer ('atomic_ulong *' (aka '_Atomic(unsigned long) *') invalid)
      g_eenter_cnt       += __atomic_exchange_n(&tcb->eenter_cnt, 0, __ATOMIC_RELAXED);

atomic_ulong is _Atomic unsigned long, not _Atomic uint64_t

Copy link
Contributor

@kailun-qin kailun-qin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 6 of 7 files at r3, 1 of 1 files at r4, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel) (waiting on @mkow)


pal/src/host/linux-sgx/host_thread.c line 73 at r4 (raw file):

Not happy about the explicit cast

+1 (UB from standard), but it looks to be the easiest way

atomic_ulong is _Atomic unsigned long, not _Atomic uint64_t

+1

This commit adds support to dump and reset SGX-related statistics
interactively, using SIGUSR1 signal. This helps to collect SGX-related
statistics only for a particular period, e.g. skipping the Gramine
startup and application initialization time and concentrating only on
the actual application processing.

The printed-out statistics are not precise, as the "stats collecting"
thread may run in parallel with other threads that update the
statistics. However, this imprecise implementation is simple and enough
for perf analysis.

This commit also breaks compatibility: SGX statistics can now only be
collected and printed when Gramine is built in debug or debugoptimized
mode. However, this should not affect users as SGX stats is tailored for
manual debugging and profiling sessions.

Co-authored-by: TejaswineeL <[email protected]>
Signed-off-by: Dmitrii Kuvaiskii <[email protected]>
Signed-off-by: TejaswineeL <[email protected]>
@dimakuv dimakuv force-pushed the dimakuv/sgx-reset-stats-on-sigusr1 branch from 6b5a59f to 560ea8e Compare September 25, 2024 07:14
Copy link
Contributor Author

@dimakuv dimakuv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 13 of 14 files reviewed, 1 unresolved discussion, not enough approvals from maintainers (1 more required), not enough approvals from different teams (1 more required, approved so far: Intel) (waiting on @kailun-qin and @mkow)


pal/src/host/linux-sgx/host_thread.c line 73 at r4 (raw file):

Previously, kailun-qin (Kailun Qin) wrote…

Not happy about the explicit cast

+1 (UB from standard), but it looks to be the easiest way

atomic_ulong is _Atomic unsigned long, not _Atomic uint64_t

+1

Done.

@dimakuv
Copy link
Contributor Author

dimakuv commented Sep 25, 2024

Jenkins, retest Jenkins-20.04 please (unrelated issue, see description below)

On Jenkins-20.04, gramine-direct ppoll01 LTP test failed:

/home/jenkins/workspace/graphene-20.04/libos/test/ltp/src/testcases/kernel/syscalls/ppoll/ppoll01.c:288: TFAIL: ret: 0, exp: -1, ret_errno: SUCCESS (0), exp_errno: EINTR (4)

I can't reproduce it locally, and I have never seen this error before. Moreover, this error has nothing to do with the current PR, as the current PR does not modify LTP or gramine-direct (aka Linux PAL) in any way.

Copy link
Contributor

@kailun-qin kailun-qin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r5, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion, not enough approvals from different teams (1 more required, approved so far: Intel) (waiting on @mkow)

Copy link
Member

@mkow mkow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion, not enough approvals from different teams (1 more required, approved so far: Intel) (waiting on @kailun-qin)


pal/src/host/linux-sgx/host_thread.c line 73 at r4 (raw file):

Previously, dimakuv (Dmitrii Kuvaiskii) wrote…

Done.

Ah, wait, you should just use C11 functions for that instead: https://en.cppreference.com/w/c/atomic. Then no cast will be needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[PAL/Linux-SGX] Allow to dump current SGX/perf stats on a signal
3 participants