SGX signal handling flows #1948
Replies: 4 comments 5 replies
-
Proposal for AEX-Notify-amenable SGX signal handling flowsPrevious post explained the current SGX signal handling flows. Unfortunately, these flows are not amenable to AEX-Notify hardware feature. The issue with the current Gramine flows is that they assume a strong coupling between the untrusted-runtime context and trusted-enclave context. In particular:
AEX-Notify breaks the strong coupling of contexts: AEX-Notify works in conjunction with the EDECCSAA instruction, that atomically switches the context from SSA 1 to SSA 0, without exiting the enclave. This implies that if the signal-handling context of the untrusted runtime would EENTER (for stage-1 handler), then the AEX-Notify-enabled flows inside the enclave would at some point execute EDECCSSA, execute the stage-2 handler, and continue normal execution of the enclave; all without exiting the enclave. Thus, the untrusted runtime becomes "stuck" in signal-handling context, which is plain wrong. Therefore, to be amenable to AEX-Notify, the signal handling flows in Gramine must be first modified: the stage-1 handler must execute in normal context of the untrusted runtime. Note that the described flows are not directly related to AEX-Notify. These flows also work without AEX-Notify. The point of this post is to show a preliminary step of changing the current design, so that in the next step AEX-Notify flows can be applied on top. Synchronous signals (SIGFPE, SIGSEGV, SIGBUS, SIGILL)The proposed flow is like this (on the example of Divide-By-Zero exception): Some notes:
Asynchronous signals (SIGTERM, SIGCONT)As in the current flows in the previous post, async signals are generated by the Linux kernel as a (delayed) response to the send-signal-to-Gramine-process request from another, untrusted process like a Bash terminal. There are two possible flows. Enclave thread was executing and got interrupted (AEX event)The proposed flow is like this (on the example of SIGTERM): Notes 1-4 from the previous section generally apply. Some additional notes:
Enclave thread exited for an OCALLNo changes required to this flow. That's because the flow doesn't involve any AEX events. |
Beta Was this translation helpful? Give feedback.
-
Proposal for AEX-Notify-enabled SGX signal handling flowsPrevious post explained the AEX-Notify-amenable signal handling flows. That post can be considered a preparation for the actual AEX-Notify flows. If the signal handling flows are implemented as in the previous post, then adding AEX-Notify flows is trivial: only the EDECCSSA instruction and its associated "context switching" must be implemented. Interestingly, the untrusted runtime logic does not need to be modified at all. Synchronous signals (SIGFPE, SIGSEGV, SIGBUS, SIGILL)The proposed flow is like this (on the example of Divide-By-Zero exception): Some notes:
Asynchronous signals (SIGTERM, SIGCONT)As in the current flows in the previous post, async signals are generated by the Linux kernel as a (delayed) response to the send-signal-to-Gramine-process request from another, untrusted process like a Bash terminal. There are two possible flows. Enclave thread was executing and got interrupted (AEX event)The proposed flow is like this (on the example of SIGTERM): Notes are same as above. Enclave thread exited for an OCALLNo changes required to this flow. That's because the flow doesn't involve any AEX events. No signals (but AEX-Notify mitigations must execute)AEX-Notify's main purpose is to execute the signal/exception handler even if there is no signal to be reported to Gramine or the application. The signal/exception handler is not supposed to do any Gramine-specific operations but it only must apply the mitigations and then continue normal execution of the in-enclave app. These mitigations must be triggered after every AEX (even if AEX doesn't result in an app-visible signal), for example, on page faults (#PF hardware exceptions). Thus, AEX-Notify introduces this new flow of "no signals". In this case, we propose to inject a dummy SIGCONT signal to re-use the existing Gramine flows without any side effects (recall that SIGCONT is really just a hint to Gramine). The proposed flow is like this (on the example of a page fault): Some notes:
|
Beta Was this translation helpful? Give feedback.
-
In the descriptions above, I used the term "normal context" as a counter-part to the "signal-handling context". In my new PRs, I am using instead "regular context", as this term seems less ambiguous, and I try to avoid having two synonyms for the same thing. Interestingly, I didn't find any ultimate source of truth which of "normal" vs "regular" terms to use. |
Beta Was this translation helpful? Give feedback.
-
I submitted the new series of PRs to add AEX-Notify support:
This series supersedes #1530 and #1531, refactoring their code, as well as fixing bugs and data races in those ones. UPDATE 22. October 2024: Rebased this series of PRs to the latest master branch, aka post v1.8 release. |
Beta Was this translation helpful? Give feedback.
-
Current SGX signal handling flows
Below explanations apply to Gramine v1.7 and (to the best of my knowledge) were like this already in v1.0.
Synchronous signals (SIGFPE, SIGSEGV, SIGBUS, SIGILL)
Synchronous signals are SIGFPE, SIGSEGV, SIGBUS and SIGILL. They are mapped to Gramine-internal signal names
PAL_EVENT_ARITHMETIC_ERROR
,PAL_EVENT_MEMFAULT
,PAL_EVENT_ILLEGAL
. See this code snippet.These signals are generated by the Linux kernel as an immediate response to the hardware exception that happens inside the SGX enclave. The flow is like this (on the example of Divide-By-Zero exception):
Some notes:
enclu
instruction with RAX register already set to the ERESUME value). AEP in Gramine looks like this.handle_sync_signal()
function returns, the signal handling context is over and Linux resumes normal Gramine execution (jumping to AEP)._PalExceptionHandler()
. See this function here.Asynchronous signals (SIGTERM, SIGCONT)
Asynchronous signals are SIGTERM and SIGCONT. They are mapped to Gramine-internal signal names
PAL_EVENT_QUIT
andPAL_EVENT_INTERRUPTED
. See this code snippet.These signals are generated by the Linux kernel as a (delayed) response to the send-signal-to-Gramine-process request from another, untrusted process like a Bash terminal. There are two possible flows.
Enclave thread was executing and got interrupted (AEX event)
This can happen in the case of e.g. a timer interrupt. Enclave thread is interrupted and automatically saves its state in the SSA0 frame. The Linux kernel decides that this is a good opportunity to inject a pending signal, e.g. SIGTERM.
Notes 1-6 from the previous section generally apply. Some additional notes:
Enclave thread exited for an OCALL
An asynchronous signal may arrive while the enclave thread exited for an OCALL, and the untrusted runtime is executing the corresponding host system call. This is especially relevant if the enclave thread exited for a blocking host-syscall operation, such as a blocking
read()
on a network socket.Some notes:
_PalExceptionHandler()
as used in other flows, see here._PalExceptionHandler()
as the context-to-be-restored argument, see here and here.syscall
instruction), the syscall return value is rewired to-EINTR
and the RIP is rewired to jump over the syscall and immediately return to the enclave.Beta Was this translation helpful? Give feedback.
All reactions