Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

POSIX Simulator: Clear SIGRESUME (SIGUSR1) when exiting xPortStartScheduler #1224

Closed
wants to merge 8 commits into from

Conversation

johnboiles
Copy link
Contributor

@johnboiles johnboiles commented Jan 18, 2025

Clear SIGRESUME (SIGUSR1) when exiting xPortStartScheduler

Description

When attempting to stop the scheduler via vTaskEndScheduler, an unhandled SIGUSR1 (aka SIGRESUME) happens when restoring the scheduler thread's signals with pthread_sigmask. This crashes the program.

I'm not sure why this is happening since sigwait is supposed to consume the pending signals. It's possible there's something platform-specific about this. I'm testing on macOS. If the behavior is different on other platforms however, it probably can't hurt to double check. What's particularly interesting is if I modify this code like this:

    while( xSchedulerEnd != pdTRUE )
    {
        int result = sigwait( &xSignals, &iSignal );
        printf("Signal received %d result=%d\n", iSignal, result);

        sigset_t set;
        sigpending( &set );
        if( sigismember( &set, SIG_RESUME ) )
        {
            int result = sigwait( &xSignals, &iSignal );
            printf("Signal received %d result=%d\n", iSignal, result);
        }
    }

I then get this:

Signal received 0 result=0
Signal received 30 result=0

Which means sigwait the first time is returning an invalid signal 0 but also a return code suggesting that it succeeded! Strange!

Test Steps

Testing with this example (full code: freertos-pr1224.zip ):

#include <FreeRTOS.h>
#include <task.h>
#include <stdio.h>
#include <stdlib.h>
#include <assert.h>

extern "C" {
    void vAssertCalled(const char *file, int line) {
        fprintf(stderr, "Assertion failed in file %s:%d\n", file, line);
        abort();
    }
}

int main() {
    TaskHandle_t task;
    xTaskCreate(
        [](void *param) {
        printf("FreeRTOS scheduler started\n");
        vTaskDelay(pdMS_TO_TICKS(1000));
        printf("Task Done, ending scheduler\n");
        vTaskEndScheduler();
        assert(false && "After scheduler ended (SHOULD NOT GET HERE)");
    }, "start", 10000, nullptr, 1, &task);
    printf("Starting FreeRTOS scheduler\n");
    vTaskStartScheduler();
    printf("FreeRTOS scheduler exited\n");
    vTaskDelete(task);
}

Without this fix, I see in lldb:

Process 9032 stopped
* thread #1, name = 'Scheduler', queue = 'com.apple.main-thread', stop reason = signal SIGUSR1
    frame #0: 0x00000001815d6620 libsystem_kernel.dylib`__pthread_sigmask + 8
libsystem_kernel.dylib`__pthread_sigmask:
->  0x1815d6620 <+8>:  b.lo   0x1815d6640    ; <+40>
    0x1815d6624 <+12>: pacibsp
    0x1815d6628 <+16>: stp    x29, x30, [sp, #-0x10]!
    0x1815d662c <+20>: mov    x29, sp
Target 0: (freertos_posix_example) stopped.
(lldb) bt
* thread #1, name = 'Scheduler', queue = 'com.apple.main-thread', stop reason = signal SIGUSR1
  * frame #0: 0x00000001815d6620 libsystem_kernel.dylib`__pthread_sigmask + 8
    frame #1: 0x000000018160f680 libsystem_pthread.dylib`pthread_sigmask + 16
    frame #2: 0x000000010000f028 freertos_posix_example`xPortStartScheduler at port.c:322:14
    frame #3: 0x000000010000902c freertos_posix_example`vTaskStartScheduler at tasks.c:3761:18
    frame #4: 0x0000000100003ce8 freertos_posix_example`main at main.cpp:25:5
    frame #5: 0x0000000181290274 dyld`start + 2840
(lldb) f 2
frame #2: 0x000000010000f028 freertos_posix_example`xPortStartScheduler at port.c:322:14
   319 	    #endif /* __APPLE__*/
   320
   321 	    /* Restore original signal mask. */
-> 322 	    ( void ) pthread_sigmask( SIG_SETMASK, &xSchedulerOriginalSignalMask, NULL );
   323
   324 	    prvDestroyThreadKey();
   325

Checklist:

  • I have tested my changes. No regression in existing tests.
  • I have modified and/or added unit-tests to cover the code changes in this Pull Request.

@johnboiles johnboiles changed the title Clear SIGRESUME (SIGUSR1) when exiting xPortStartScheduler POSIX Simulator: Clear SIGRESUME (SIGUSR1) when exiting xPortStartScheduler Jan 18, 2025
@johnboiles
Copy link
Contributor Author

I'll undo this change in my local fork of FreeRTOS and see if I still have issues. It might be that this was solved by #1223

@johnboiles
Copy link
Contributor Author

I still have this issue, though I don't fully understand why sigwait isn't clearing that signal. There may be a more root-cause-y solution to this, but this is all I've been able to figure out.

@johnboiles johnboiles marked this pull request as ready for review January 29, 2025 21:12
@johnboiles johnboiles requested a review from a team as a code owner January 29, 2025 21:12
@johnboiles johnboiles force-pushed the clear-pending-signals branch from 85c76ce to 94263d2 Compare January 29, 2025 21:42
@aggarg
Copy link
Member

aggarg commented Jan 30, 2025

As you already mentioned, this seems like a hack. Do I need a mac to repro it or is it reproducible on Linux?

@johnboiles
Copy link
Contributor Author

@aggarg I haven't gotten to trying on Linux yet but I can give it a shot. I'm reproducing in macOS with LLDB. I wonder if LLDB could be triggering this strange behavior somehow.

@aggarg
Copy link
Member

aggarg commented Jan 31, 2025

Can you try running without the debugger?

@kar-rahul-aws
Copy link
Member

I tried to reproduce the issue on Ubuntu 20.04 , but running it with or without the debugger, the issue is not reproduced.

Code used

#include <FreeRTOS.h>
#include <task.h>
#include <stdio.h>
#include <stdlib.h>
#include <assert.h>

void task_function(void *param) {
    ( void ) param;
    printf("FreeRTOS scheduler started\n");
    vTaskDelay(pdMS_TO_TICKS(1000));
    printf("Task Done, ending scheduler\n");
    vTaskEndScheduler();
    assert(0 && "After scheduler ended (SHOULD NOT GET HERE)");
}

int main_test() {
    TaskHandle_t task;
    xTaskCreate( task_function, "start", 10000, NULL, 1, &task );
    printf("Starting FreeRTOS scheduler\n");
    vTaskStartScheduler();
    printf("FreeRTOS scheduler exited\n");
    vTaskDelete(task);
    return 0;
}

Results after running without debugger

Starting FreeRTOS scheduler
FreeRTOS scheduler started
Task Done, ending scheduler
FreeRTOS scheduler exited

Results after running with debugger

Starting FreeRTOS scheduler
FreeRTOS scheduler started
Task Done, ending scheduler
FreeRTOS scheduler exited
[1] + Done                       "/usr/bin/gdb" --interpreter=mi --tty=${DbgTerm} 0<"/tmp/Microsoft-MIEngine-In-ogdc4hd5.xao" 1>"/tmp/Microsoft-MIEngine-Out-e1s1jg33.5yj"

I also tried your change in xPortStartScheduler which resulted in an invalid signal on MacOS, but the same code returns a valid signal in Ubuntu , and runs only once.

    while( xSchedulerEnd != pdTRUE )
    {
        int result = sigwait( &xSignals, &iSignal );
        printf("Signal received %d result=%d\n", iSignal, result);

        sigset_t set;
        sigpending( &set );
        if( sigismember( &set, SIG_RESUME ) )
        {
            int result = sigwait( &xSignals, &iSignal );
            printf("Signal received %d result=%d\n", iSignal, result);
        }
    }

Result on Ubuntu:

Starting FreeRTOS scheduler
FreeRTOS scheduler started
Task Done, ending scheduler
Signal received 10 result=0
FreeRTOS scheduler exited

@johnboiles
Copy link
Contributor Author

Can you try running without the debugger?

I can't repro/observe this issue without the debugger. I'm not sure if that means the issue isn't happening, or if it means the issue is happening but not observable. Would you expect the program to crash when it gets the SIGUSR1 signal when restoring the signals in pthread_sigmask (port.c:311)?

I also tried your change in xPortStartScheduler which resulted in an invalid signal on MacOS

@kar-rahul-aws so you were able to repro this on macOS?

@aggarg
Copy link
Member

aggarg commented Feb 7, 2025

Would you expect the program to crash when it gets the SIGUSR1 signal when restoring the signals in pthread_sigmask (port.c:311)?

No, I would not expect that to crash.

@bhoomrs
Copy link
Member

bhoomrs commented Feb 7, 2025

Hello @johnboiles,
We were able to reproduce the error on macOS using LLDB. Your solution helped fix it, thank you.
However, during our investigation, we found that LLDB is intercepting the SIGUSR1 signal during debugging, which leads to a potential race condition.
Adding the following line to ~/.lldbinit prevented the issue:

process handle SIGUSR1 -n true -p true -s false

Once set, this configuration remains in effect until explicitly changed.
Since this resolves the issue without requiring code changes, we will be closing the PR. Thanks again!

@johnboiles
Copy link
Contributor Author

Hi @bhoomrs, thanks for the suggestion! However, It looks like it does not resolve the issue, but instead just ignores the breakpoint when the signal fires. The program still terminates execution at the pthread_sigmask line. With your ~/.lldbinit addition I see the program terminate with a exit code of 30. Is this consistent with what you're seeing?

(lldb) pro la
Process 28916 launched: '/Users/johnboiles/Developer/repos/FreeRTOS/freertos-pr1224/build/freertos_posix_example' (arm64)
Starting FreeRTOS scheduler
FreeRTOS scheduler started
Task Done, ending scheduler
Process 28916 stopped and restarted: thread 1 received signal: SIGUSR1
Process 28916 exited with status = 30 (0x0000001e) Terminated due to signal 30
(lldb)

While if running outside of LLDB it completes the program (the final FreeRTOS scheduler exited shows and the exit code is 0)

Starting FreeRTOS scheduler
FreeRTOS scheduler started
Task Done, ending scheduler
FreeRTOS scheduler exited
> echo $?
0

Is this consistent with what you're seeing? I'm definitely open to a .lldbinit or similar workaround if we can find one. It is helpful to be able to debug FreeRTOS programs.

@bhoomrs
Copy link
Member

bhoomrs commented Feb 10, 2025

This is the output I get while running the demo with lldb in the terminal:

➜  Posix_GCC git:(main) ✗ lldb ./build/posix_demo 
(lldb) target create "./build/posix_demo"
Current executable set to '/Users/bhoomrs/P3/FreeRTOS/FreeRTOS/Demo/Posix_GCC/build/posix_demo' (arm64).
(lldb) run
Process 59557 launched: '/Users/bhoomrs/P3/FreeRTOS/FreeRTOS/Demo/Posix_GCC/build/posix_demo' (arm64)

Trace started.
The trace will be dumped to disk if a call to configASSERT() fails.
Starting full demo
Starting FreeRTOS scheduler
FreeRTOS scheduler started
Task Done, ending scheduler
FreeRTOS scheduler exited
Process 59557 exited with status = 0 (0x00000000) 

I acknowledge that this approach doesn’t actually fix the underlying error—it simply ignores it, much like in GDB.
I wonder why this fix is not working for you. Can you try using the command -exec process handle SIGUSR1 -n true -p true -s false while debugging?

@johnboiles
Copy link
Contributor Author

@bhoomrs I'm not sure where to put the -exec parameter. Do you mean something like this?

> lldb -o "process handle SIGUSR1 -n true -p true -s false" -- build/freertos_posix_example
NAME         PASS     STOP     NOTIFY
===========  =======  =======  =======
SIGUSR1      true     false    true
(lldb) target create "build/freertos_posix_example"
Current executable set to '/Users/johnboiles/Developer/repos/FreeRTOS/freertos-pr1224/build/freertos_posix_example' (arm64).
(lldb) process handle SIGUSR1 -n true -p true -s false
NAME         PASS     STOP     NOTIFY
===========  =======  =======  =======
SIGUSR1      true     false    true
(lldb) pro la
Process 4324 launched: '/Users/johnboiles/Developer/repos/FreeRTOS/freertos-pr1224/build/freertos_posix_example' (arm64)
Starting FreeRTOS scheduler
FreeRTOS scheduler started
Task Done, ending scheduler
Process 4324 stopped and restarted: thread 1 received signal: SIGUSR1
Process 4324 exited with status = 30 (0x0000001e) Terminated due to signal 30

If I set PASS and NOTIFY also to false it works.

lldb -o "process handle SIGUSR1 -n false -p false -s false" -- build/freertos_posix_example
NAME         PASS     STOP     NOTIFY
===========  =======  =======  =======
SIGUSR1      true     false    true
(lldb) target create "build/freertos_posix_example"
Current executable set to '/Users/johnboiles/Developer/repos/FreeRTOS/freertos-pr1224/build/freertos_posix_example' (arm64).
(lldb) process handle SIGUSR1 -n false -p false -s false
NAME         PASS     STOP     NOTIFY
===========  =======  =======  =======
SIGUSR1      false    false    false
(lldb) pro la
Process 8510 launched: '/Users/johnboiles/Developer/repos/FreeRTOS/freertos-pr1224/build/freertos_posix_example' (arm64)
Starting FreeRTOS scheduler
FreeRTOS scheduler started
Task Done, ending scheduler
FreeRTOS scheduler exited
Process 8510 exited with status = 0 (0x00000000)

I think that's an acceptable workaround if it can be documented somewhere. Though I would suggest that keeping this 4-line with a comment is maybe a good place to document it! ;) I added a #if __APPLE__ in there and updated the comment in case you want to go that direction.

@aggarg
Copy link
Member

aggarg commented Feb 11, 2025

I think that's an acceptable workaround if it can be documented somewhere.

That sounds reasonable. We can create a README here and document it.

I added a #if APPLE in there and updated the comment in case you want to go that direction.

This check seems like addressing a race rather than fixing the root cause and therefore, I do not think we should add this. The above mentioned lldb command is the correct way to inform the debugger to not interfere with SIGUSR1.

@johnboiles
Copy link
Contributor Author

This check seems like addressing a race rather than fixing the root cause and therefore, I do not think we should add this.

I can accept that now that we have a workaround. Thanks for considering it.

@bhoomrs
Copy link
Member

bhoomrs commented Feb 12, 2025

@johnboiles Thank you for bringing this to our attention!

bhoomrs added a commit that referenced this pull request Feb 12, 2025
While using the macOS default LLDB debugger, a call to vTaskEndScheduler results in an unhandled SIGUSR1 (aka SIGRESUME) when restoring the scheduler thread's signals with pthread_sigmask. This crashes the program.

Added instructions in portable/ThirdParty/GCC/Posix/port.c to suppress SIGUSR1 to prevent LLDB debugger interference when exiting xPortStartScheduler

Thanks to: @johnboiles for pointing it out in #1224
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants