Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CONT stack overflow postmortem #9083

Merged
merged 7 commits into from
Mar 26, 2024
Merged

Conversation

mcspr
Copy link
Collaborator

@mcspr mcspr commented Feb 5, 2024

  • check in cont_run() and cont_suspend() whether a1 points after cont_t::stack[0]
  • in case a1 is broken, postmortem will still report proper context in
    proper stack boundaries. previously it used sys ctx values

additionally

  • as suggested in ESP8266 Crashing During Prints randomly #9069, change stack smashing to a single line that
    does not mention any Exceptions
  • reduce overall stack dump length when there are know garbage values i.e. cont stackguard
  • decoder.py addr search regexp should no longer skip stack lines ending with ' <'
  • update decoder.py parsing so it notices both stack smashing and alloc errors

Noticed in esp-idf cfg as CONFIG_FREERTOS_CHECK_STACKOVERFLOW_PTRVAL=y, where Freertos it is a so called 'Method 1' of overflow checking happening right before ctx switch
https://www.freertos.org/Stacks-and-stack-overflow-checking.html
(also ref. #8666)

simple example like this one usually already offsets a1 when yield() happens, so the newly added check would cause a reset

#include <Arduino.h>

void setup() {
  Serial.begin(115200);
}

void loop() {
  int buf[1048]; // oops
  yield();
  printf("%p %p %p %p\n",
    &buf[0], &buf[1], &buf[2], &buf[3]);
}

and it should generate more useful info as the result

Stack overflow detected
sp: 3fffef28 end: 3fffffd0 offset: 0000

ctx: cont
0x4010015c: ets_post at /home/runner/.platformio/packages/framework-arduinoespressif8266@src-31d658a59f41540201fc3726a1394910/cores/esp8266/core_esp8266_main.cpp:243
0x402017f5: esp_suspend_within_cont at /home/runner/.platformio/packages/framework-arduinoespressif8266@src-31d658a59f41540201fc3726a1394910/cores/esp8266/core_esp8266_main.cpp:121
(inlined by) __yield at /home/runner/.platformio/packages/framework-arduinoespressif8266@src-31d658a59f41540201fc3726a1394910/cores/esp8266/core_esp8266_main.cpp:193
0x40201052: loop at /home/runner/dev/stack_overflow_test/src/main.cpp:10
0x402016da: loop_task(ETSEventTag*) at /home/runner/.platformio/packages/framework-arduinoespressif8266@src-31d658a59f41540201fc3726a1394910/cores/esp8266/core_esp8266_main.cpp:279
0x40105245: call_user_start_local at ??:?
0x4010524b: call_user_start_local at ??:?
0x4010000d: call_user_start at ??:?
0x401000ab: app_entry_redefinable at /home/runner/.platformio/packages/framework-arduinoespressif8266@src-31d658a59f41540201fc3726a1394910/cores/esp8266/core_esp8266_main.cpp:391
0x40225e1e: cont_ret at cont.S.o:?
0x40202761: uart_flush at /home/runner/.platformio/packages/framework-arduinoespressif8266@src-31d658a59f41540201fc3726a1394910/cores/esp8266/uart.cpp:583
0x401009d8: malloc at /home/runner/.platformio/packages/framework-arduinoespressif8266@src-31d658a59f41540201fc3726a1394910/cores/esp8266/umm_malloc/umm_malloc.cpp:914
0x40202ab0: uart_init at /home/runner/.platformio/packages/framework-arduinoespressif8266@src-31d658a59f41540201fc3726a1394910/cores/esp8266/uart.cpp:715
0x402029d6: uart_init at /home/runner/.platformio/packages/framework-arduinoespressif8266@src-31d658a59f41540201fc3726a1394910/cores/esp8266/uart.cpp:707
0x40202c2c: uart_set_debug at /home/runner/.platformio/packages/framework-arduinoespressif8266@src-31d658a59f41540201fc3726a1394910/cores/esp8266/uart.cpp:989
0x402012b0: HardwareSerial::begin(unsigned long, SerialConfig, SerialMode, unsigned char, bool) at /home/runner/.platformio/packages/framework-arduinoespressif8266@src-31d658a59f41540201fc3726a1394910/cores/esp8266/HardwareSerial.cpp:56
0x402018b0: loop_wrapper() at /home/runner/.platformio/packages/framework-arduinoespressif8266@src-31d658a59f41540201fc3726a1394910/cores/esp8266/core_esp8266_main.cpp:263
0x40100d4d: cont_wrapper at ??:?
0x40201052: loop at /home/runner/dev/stack_overflow_test/src/main.cpp:10
> sed -ne 10p src/main.cpp
  printf("%p %p %p %p\n",

- check in cont_run() and cont_suspend() whether a1 is out of bounds
- in case a1 is broken, postmortem will still report proper context in
  proper stack boundaries

additionally
- as suggested in esp8266#9069, change stack smashing to a single line that
  does not mention any Exceptions
- reduce overall stack dump length when there are know garbage values i.e. cont stackguard
- decoder.py addr search regexp would no longer skip stack lines with '<'
- fix decoder.py parsing so it notices both stack smashing and alloc errors
Comment on lines 263 to 264
ets_printf_P(PSTR("\nctx: cont\n"));
stack_end = cont_stack_end;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

two conditions for the same code, is it intended ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not, leftover from old impl... Changed to 'cond1 || cond2'

Copy link
Collaborator

@d-a-v d-a-v left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, can save hours of debugging

@mcspr mcspr merged commit eda4e08 into esp8266:master Mar 26, 2024
28 checks passed
@mcspr mcspr deleted the stack/cont-overflow branch March 26, 2024 07:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants