Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory corruption when saving network settings #2313

Open
4 tasks done
schlimmchen opened this issue Sep 28, 2024 · 18 comments
Open
4 tasks done

Memory corruption when saving network settings #2313

schlimmchen opened this issue Sep 28, 2024 · 18 comments
Labels
bug Something isn't working

Comments

@schlimmchen
Copy link
Contributor

What happened?

I noticed a bunch of ESP reboots after exceptions and started digging. One of them I could isolate and pin down to also occur in this project: When saving the network settings, there is some kind of memory corruption.

To Reproduce Bug

Save the network settings by clicking the save button in the web UI. Repeat until you observe a random crash.

Expected Behavior

Graceful application of network settings.

Install Method

Self-Compiled

What git-hash/version of OpenDTU?

3559007

What firmware variant (PIO Environment) are you using?

generic_esp32s3_usb

Relevant log/trace output

No response

Anything else?

Example 1
Nothing received, resend whole request
TX AlarmData 868.00 MHz --> 15 93 10 07 59 80 16 72 56 80 11 00 66 F8 73 EA 00 00 00 00 00 00 00 00 BB 5E 09 
Setting Hostname... done
Interrupt received
Configuring WiFi STA using existing credentials... done
Configuring WiFi STA DHCP IP... [ 27042][E][WiFiClient.cpp:275] connect(): socket error on fd 49, errno: 113, "Software caused connection abort"
done

assert failed: heap_caps_free heap_caps.c:381 (heap != NULL && "free() target pointer is outside heap areas")


Backtrace: 0x40377b32:0x3fcb8110 0x4037d1d1:0x3fcb8130 0x40383ddd:0x3fcb8150 0x40378382:0x3fcb8280 0x40383e0d:0x3fcb82a0 0x4203dfe5:0x3fcb82c0 0x4203dff5:0x3fcb82e0 0x4202bd4d:0x3fcb8300 0x4202e422:0x3fcb8320 0x42
013d4e:0x3fcb8340 0x42013d66:0x3fcb8360 0x42013d71:0x3fcb8380 0x4202d706:0x3fcb83a0 0x4202f67d:0x3fcb83c0 0x4202ba12:0x3fcb83e0 0x4202ba1f:0x3fcb8400 0x420c89ea:0x3fcb8420 0x420c8d79:0x3fcb8450 0x420c908f:0x3fcb84
80




ELF file SHA256: 664fedd146fc71fc
Example 2
Admin AP remaining seconds: 110 / 120
Setting Hostname... done
[114955][E][WiFiUdp.cpp:221] parsePacket(): could not receive daConfiguring WiFi STA using ta: 9
existing credentials... done
Configuring WiFi STA DHCP IP... done
Guru Meditation Error: Core  0 panic'ed (LoadProhibited). Exception was unhandled.

Core  0 register dump:
PC      : 0x4202bd48  PS      : 0x00060f30  A0      : 0x8202e425  A1      : 0x3fcb8360  
A2      : 0x3fcc3fe0  A3      : 0x00000000  A4      : 0x00000000  A5      : 0x3fcc21c8  
A6      : 0x3fcf0b24  A7      : 0x00000000  A8      : 0x8202bd5d  A9      : 0x3fcb8340  
A10     : 0x00000018  A11     : 0x3fcc322c  A12     : 0x00000000  A13     : 0x00000001  
A14     : 0x00060920  A15     : 0x00000001  SAR     : 0x00000005  EXCCAUSE: 0x0000001c  
EXCVADDR: 0x00000000  LBEG    : 0x400570e8  LEND    : 0x400570f3  LCOUNT  : 0xffffffff  


Backtrace: 0x4202bd45:0x3fcb8360 0x4202e422:0x3fcb8380 0x42013d4e:0x3fcb83a0 0x42013d66:0x3fcb83c0 0x42013d71:0x3fcb83e0 0x4202d706:0x3fcb8400 0x4202f67d:0x3fcb8420 0x4202ba12:0x3fcb8440 0x4202ba1f:0x3fcb8460 0x42
0c89ea:0x3fcb8480 0x420c8d79:0x3fcb84b0 0x420c908f:0x3fcb84e0




ELF file SHA256: 664fedd146fc71fc
Example 3
All missing
Nothing received, resend whole request
TX RealTimeRunData 868.00 MHz --> 15 93 10 07 59 80 16 72 56 80 0B 00 66 F8 74 A5 00 00 00 00 00 00 00 00 88 84 B2 
Setting Hostname... done
[ 17184][E][WiFiUdp.cpp:221] parsePacket(): could not receive daConfiguring WiFi STA using ta: 9
existing credentials... done
Configuring WiFi STA DHCP IP... [ 17195][E][WiFiClient.cpp:275] connect(): socket error on fd 49, errno: 113, "Software caused connection abort"
done

assert failed: tcp_update_rcv_ann_wnd IDF/components/lwip/lwip/src/core/tcp.c:951 (new_rcv_ann_wnd <= 0xffff)


Backtrace: 0x40377b32:0x3fcedb80 0x4037d1d1:0x3fcedba0 0x40383ddd:0x3fcedbc0 0x4205c85e:0x3fcedcf0 0x4205c8f2:0x3fcedd10 0x420c8135:0x3fcedd30 0x420594cd:0x3fcedd50




ELF file SHA256: 664fedd146fc71fc

Please confirm the following

  • I believe this issue is a bug that affects all users of OpenDTU, not something specific to my installation.
  • I have already searched for relevant existing issues and discussions before opening this report.
  • I have updated the title field above with a concise description.
  • I have double checked that my inverter does not contain a W in the model name (like HMS-xxxW) as they are not supported.
@schlimmchen schlimmchen added the bug Something isn't working label Sep 28, 2024
@schlimmchen
Copy link
Contributor Author

schlimmchen commented Sep 28, 2024

In a quick and dirty test I made NetworkSettings execute enableAdminMode() and applyConfig() from inside its loop() (to have these functions execute synchronously to the main loop()), but it seems the issue is not fully resolved:

Log
RX Period End
All missing
Nothing received, resend whole request
TX RealTimeRunData 868.00 MHz --> 15 93 10 07 59 80 16 72 56 80 0B 00 66 F8 76 F4 00 00 00 00 00 00 00 00 D1 D4 E8 
Disconnected from MQTT.
Disconnect reason:TCP_DISCONNECTED

assert failed: tcp_sent IDF/components/lwip/lwip/src/core/tcp.c:2131 (invalid socket state for sent callback)


Backtrace: 0x40377b32:0x3fcb8300 0x4037d1d1:0x3fcb8320 0x40383ddd:0x3fcb8340 0x4205c9e2:0x3fcb8470 0x420c8d5a:0x3fcb8490 0x420c90bf:0x3fcb84c0




ELF file SHA256: d80c092b282e219f

It seems I can't reproduce the issue when I wait for the device to print that it got it's IP address before hitting the "Save" button again, at least with the change I described in this comment.

Edit: I found this: #2298 (comment) which could be related.

@stefan123t
Copy link
Contributor

stefan123t commented Sep 28, 2024

@schlimmchen yes I have read that issue #2298 too. Basically your fix sounds good, but there seems to be an RF task running in parallel when I look at the Example Logs you provided. Here the IRQ subroutine may be triggered which could “interrupt” [sic] your inline code path.
Can we disable RF comms before we save the config, or is that part of the enableAdminMode() already ? Maybe we have to disable the IRQ before and flush the RF buffer afterwards ?
Maybe add a semaphore to enableAdminMode ?

@tbnobody
Copy link
Owner

It would be helpfull to show use the log output in vscode as it automatically integrates the exception parser and shows a proper stack trace with readable symbols.

@tbnobody
Copy link
Owner

When testing the migration to arduino core 3 I also realized that it immediatly crashes in WiFi.disconnect(true, true); didn't have time to debug this any further.

@stefan123t
Copy link
Contributor

@schlimmchen is this solved with your other fix from today that prevents allocating a m = new AuthenticationMiddleware(); for every Live API request or is this still an other issue ?

@tbnobody
Copy link
Owner

IMHO this is a different issue as it only occours when pressing save.

@stefan123t
Copy link
Contributor

see #2360

@schlimmchen
Copy link
Contributor Author

see #2360

Yeah, maybe... Maybe not? Maybe #2360 will only mask the issue?

I know I need to send backtraces... In case I can't reproduce this with #2360, I will instead close this, regardless of whether the root cause was fixed or concealed.

@schlimmchen
Copy link
Contributor Author

Guru Meditation Error: Core  0 panic'ed (LoadProhibited). Exception was unhandled.

Core  0 register dump:
PC      : 0x4202d894  PS      : 0x00060b30  A0      : 0x8202fd05  A1      : 0x3fcb8900
A2      : 0x3fcbbcd8  A3      : 0x00000000  A4      : 0x00000000  A5      : 0x3fcb9e54
A6      : 0x3fcf0b24  A7      : 0x00000000  A8      : 0x8202d8a9  A9      : 0x3fcb88e0
A10     : 0x00000018  A11     : 0x3fcba658  A12     : 0x00000000  A13     : 0x00000001
A14     : 0x00060520  A15     : 0x00000001  SAR     : 0x0000001d  EXCCAUSE: 0x0000001c
EXCVADDR: 0x00000000  LBEG    : 0x400570e8  LEND    : 0x400570f3  LCOUNT  : 0xffffffff

Backtrace: 0x4202d891:0x3fcb8900 0x4202fd02:0x3fcb8920 0x42027fda:0x3fcb8940 0x4202800a:0x3fcb8960 0x42028015:0x3fcb8980 0x4202efe6:0x3fcb89a0 0x42030cad:0x3fcb89c0 0x4202cb17:0x3fcb89e0 0x4202cb23:0x3fcb8a00 0x420cdbb2:0x3fcb8a20 0x420cdf41:0x3fcb8a50 0x420ce257:0x3fcb8a80

ELF file SHA256: 9fc0483beb1c1f20

0x4202d891: AsyncWebHeader::~AsyncWebHeader() at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/ESPAsyncWebServer/src/ESPAsyncWebServer.h:137
 (inlined by) void __gnu_cxx::new_allocator<std::_List_node<AsyncWebHeader> >::destroy<AsyncWebHeader>(AsyncWebHeader*) at /home/schlimmchen/.platformio/packages/toolchain-xtensa-esp32s3/xtensa-esp32s3-elf/include/c++/8.4.0/ext/new_allocator.h:140
 (inlined by) void std::allocator_traits<std::allocator<std::_List_node<AsyncWebHeader> > >::destroy<AsyncWebHeader>(std::allocator<std::_List_node<AsyncWebHeader> >&, AsyncWebHeader*) at /home/schlimmchen/.platformio/packages/toolchain-xtensa-esp32s3/xtensa-esp32s3-elf/include/c++/8.4.0/bits/alloc_traits.h:487
 (inlined by) std::__cxx11::_List_base<AsyncWebHeader, std::allocator<AsyncWebHeader> >::_M_clear() at /home/schlimmchen/.platformio/packages/toolchain-xtensa-esp32s3/xtensa-esp32s3-elf/include/c++/8.4.0/bits/list.tcc:77
0x4202fd02: std::__cxx11::_List_base<AsyncWebHeader, std::allocator<AsyncWebHeader> >::~_List_base() at /home/schlimmchen/.platformio/packages/toolchain-xtensa-esp32s3/xtensa-esp32s3-elf/include/c++/8.4.0/bits/stl_list.h:507
 (inlined by) std::__cxx11::list<AsyncWebHeader, std::allocator<AsyncWebHeader> >::~list() at /home/schlimmchen/.platformio/packages/toolchain-xtensa-esp32s3/xtensa-esp32s3-elf/include/c++/8.4.0/bits/stl_list.h:835
 (inlined by) AsyncWebServerResponse::~AsyncWebServerResponse() at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/ESPAsyncWebServer/src/ESPAsyncWebServer.h:766
0x42027fda: AsyncAbstractResponse::~AsyncAbstractResponse() at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/ESPAsyncWebServer/src/WebResponseImpl.h:47
0x4202800a: AsyncJsonResponse::~AsyncJsonResponse() at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/ESPAsyncWebServer/src/AsyncJson.h:58
0x42028015: AsyncJsonResponse::~AsyncJsonResponse() at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/ESPAsyncWebServer/src/AsyncJson.h:58
0x4202efe6: AsyncWebServerRequest::~AsyncWebServerRequest() at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/ESPAsyncWebServer/src/WebRequest.cpp:53 (discriminator 1)
0x42030cad: AsyncWebServer::_handleDisconnect(AsyncWebServerRequest*) at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/ESPAsyncWebServer/src/WebServer.cpp:138 (discriminator 1)
0x4202cb17: AsyncWebServerRequest::_onDisconnect() at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/ESPAsyncWebServer/src/WebRequest.cpp:204
0x4202cb23: std::_Function_handler<void (void*, AsyncClient*), AsyncWebServerRequest::AsyncWebServerRequest(AsyncWebServer*, AsyncClient*)::{lambda(void*, AsyncClient*)#3}>::_M_invoke(std::_Any_data const&, void*&&, AsyncClient*&&) at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/ESPAsyncWebServer/src/WebRequest.cpp:41
 (inlined by) _M_invoke at /home/schlimmchen/.platformio/packages/toolchain-xtensa-esp32s3/xtensa-esp32s3-elf/include/c++/8.4.0/bits/std_function.h:297
0x420cdbb2: std::function<void (void*, AsyncClient*)>::operator()(void*, AsyncClient*) const at /home/schlimmchen/.platformio/packages/toolchain-xtensa-esp32s3/xtensa-esp32s3-elf/include/c++/8.4.0/bits/std_function.h:687
0x420cdf41: AsyncClient::_error(signed char) at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/AsyncTCP/src/AsyncTCP.cpp:945
0x420ce257: AsyncClient::_s_error(void*, signed char) at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/AsyncTCP/src/AsyncTCP.cpp:1391
 (inlined by) _handle_async_event at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/AsyncTCP/src/AsyncTCP.cpp:184
 (inlined by) _async_service_task at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/AsyncTCP/src/AsyncTCP.cpp:207

Nope... Got this on the first try using v24.10.15 including the write guard.

The backtrace seems to suggest that the Async TCP Server can't handle the underlying network connection breaking, or to be more specific: Once the connection breaks, handling the disconnect triggers some kind of issue.

assert failed: tlsf_free heap_tlsf.c:965 (!block_is_free(block) && "block already marked as free")

Backtrace: 0x40377dc6:0x3fcb8650 0x4037d241:0x3fcb8670 0x40383f55:0x3fcb8690 0x4038307d:0x3fcb87c0 0x40383a48:0x3fcb87e0 0x40383b76:0x3fcb8800 0x4037863d:0x3fcb8820 0x40383f85:0x3fcb8840 0x420ca799:0x3fcb8860 0x4202d8a6:0x3fcb8880 0x4202fd02:0x3fcb88a0 0x42027fda:0x3fcb88c0 0x4202800a:0x3fcb88e0 0x42028015:0x3fcb8900 0x4202efe6:0x3fcb8920 0x42030cad:0x3fcb8940 0x4202cb17:0x3fcb8960 0x4202cb23:0x3fcb8980 0x420cdbb2:0x3fcb89a0 0x420cdf41:0x3fcb89d0 0x420ce257:0x3fcb8a00

ELF file SHA256: 9fc0483beb1c1f20
0x40377dc6: panic_abort at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/esp_system/panic.c:408
0x4037d241: esp_system_abort at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/esp_system/esp_system.c:137
0x40383f55: __assert_func at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/newlib/assert.c:85
0x4038307d: block_merge_prev at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/heap/heap_tlsf.c:343
 (inlined by) tlsf_free at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/heap/heap_tlsf.c:967
0x40383a48: multi_heap_free_impl at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/heap/multi_heap.c:212
0x40383b76: multi_heap_free at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/heap/multi_heap_poisoning.c:266
0x4037863d: heap_caps_free at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/heap/heap_caps.c:382
0x40383f85: free at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/newlib/heap.c:39
0x420ca799: operator delete(void*) at /builds/idf/crosstool-NG/.build/xtensa-esp32s3-elf/src/gcc/libstdc++-v3/libsupc++/del_op.cc:49
0x4202d8a6: __gnu_cxx::new_allocator<std::_List_node<AsyncWebHeader> >::deallocate(std::_List_node<AsyncWebHeader>*, unsigned int) at /home/schlimmchen/.platformio/packages/toolchain-xtensa-esp32s3/xtensa-esp32s3-elf/include/c++/8.4.0/ext/new_allocator.h:125
 (inlined by) std::allocator_traits<std::allocator<std::_List_node<AsyncWebHeader> > >::deallocate(std::allocator<std::_List_node<AsyncWebHeader> >&, std::_List_node<AsyncWebHeader>*, unsigned int) at /home/schlimmchen/.platformio/packages/toolchain-xtensa-esp32s3/xtensa-esp32s3-elf/include/c++/8.4.0/bits/alloc_traits.h:462
 (inlined by) std::__cxx11::_List_base<AsyncWebHeader, std::allocator<AsyncWebHeader> >::_M_put_node(std::_List_node<AsyncWebHeader>*) at /home/schlimmchen/.platformio/packages/toolchain-xtensa-esp32s3/xtensa-esp32s3-elf/include/c++/8.4.0/bits/stl_list.h:454
 (inlined by) std::__cxx11::_List_base<AsyncWebHeader, std::allocator<AsyncWebHeader> >::_M_clear() at /home/schlimmchen/.platformio/packages/toolchain-xtensa-esp32s3/xtensa-esp32s3-elf/include/c++/8.4.0/bits/list.tcc:81
0x4202fd02: std::__cxx11::_List_base<AsyncWebHeader, std::allocator<AsyncWebHeader> >::~_List_base() at /home/schlimmchen/.platformio/packages/toolchain-xtensa-esp32s3/xtensa-esp32s3-elf/include/c++/8.4.0/bits/stl_list.h:507
 (inlined by) std::__cxx11::list<AsyncWebHeader, std::allocator<AsyncWebHeader> >::~list() at /home/schlimmchen/.platformio/packages/toolchain-xtensa-esp32s3/xtensa-esp32s3-elf/include/c++/8.4.0/bits/stl_list.h:835
 (inlined by) AsyncWebServerResponse::~AsyncWebServerResponse() at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/ESPAsyncWebServer/src/ESPAsyncWebServer.h:766
0x42027fda: AsyncAbstractResponse::~AsyncAbstractResponse() at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/ESPAsyncWebServer/src/WebResponseImpl.h:47
0x4202800a: AsyncJsonResponse::~AsyncJsonResponse() at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/ESPAsyncWebServer/src/AsyncJson.h:58
0x42028015: AsyncJsonResponse::~AsyncJsonResponse() at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/ESPAsyncWebServer/src/AsyncJson.h:58
0x4202efe6: AsyncWebServerRequest::~AsyncWebServerRequest() at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/ESPAsyncWebServer/src/WebRequest.cpp:53 (discriminator 1)
0x42030cad: AsyncWebServer::_handleDisconnect(AsyncWebServerRequest*) at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/ESPAsyncWebServer/src/WebServer.cpp:138 (discriminator 1)
0x4202cb17: AsyncWebServerRequest::_onDisconnect() at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/ESPAsyncWebServer/src/WebRequest.cpp:204
0x4202cb23: std::_Function_handler<void (void*, AsyncClient*), AsyncWebServerRequest::AsyncWebServerRequest(AsyncWebServer*, AsyncClient*)::{lambda(void*, AsyncClient*)#3}>::_M_invoke(std::_Any_data const&, void*&&, AsyncClient*&&) at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/ESPAsyncWebServer/src/WebRequest.cpp:41
 (inlined by) _M_invoke at /home/schlimmchen/.platformio/packages/toolchain-xtensa-esp32s3/xtensa-esp32s3-elf/include/c++/8.4.0/bits/std_function.h:297
0x420cdbb2: std::function<void (void*, AsyncClient*)>::operator()(void*, AsyncClient*) const at /home/schlimmchen/.platformio/packages/toolchain-xtensa-esp32s3/xtensa-esp32s3-elf/include/c++/8.4.0/bits/std_function.h:687
0x420cdf41: AsyncClient::_error(signed char) at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/AsyncTCP/src/AsyncTCP.cpp:945
0x420ce257: AsyncClient::_s_error(void*, signed char) at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/AsyncTCP/src/AsyncTCP.cpp:1391
 (inlined by) _handle_async_event at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/AsyncTCP/src/AsyncTCP.cpp:184
 (inlined by) _async_service_task at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/AsyncTCP/src/AsyncTCP.cpp:207

Hm, similar context. Maybe a double free?

Guru Meditation Error: Core  0 panic'ed (LoadStoreError). Exception was unhandled.

Core  0 register dump:
PC      : 0x40383cd0  PS      : 0x00060133  A0      : 0x80383b5f  A1      : 0x3fcb87d0
A2      : 0x3fcade10  A3      : 0x00000001  A4      : 0x00000004  A5      : 0xbaad5678
A6      : 0x00060920  A7      : 0x00000001  A8      : 0x43b9e1ff  A9      : 0x00000003
A10     : 0x00000001  A11     : 0x3fcade08  A12     : 0xabba1234  A13     : 0xabba1234
A14     : 0x00060120  A15     : 0x00000001  SAR     : 0x00000015  EXCCAUSE: 0x00000003
EXCVADDR: 0x43b9e1ff  LBEG    : 0x400570e8  LEND    : 0x400570f3  LCOUNT  : 0xffffffff

Backtrace: 0x40383ccd:0x3fcb87d0 0x40383b5c:0x3fcb8800 0x4037863d:0x3fcb8820 0x40383f85:0x3fcb8840 0x420ca799:0x3fcb8860 0x4202d8a6:0x3fcb8880 0x4202fd02:0x3fcb88a0 0x42027fda:0x3fcb88c0 0x4202800a:0x3fcb88e0 0x42028015:0x3fcb8900 0x4202efe6:0x3fcb8920 0x42030cad:0x3fcb8940 0x4202cb17:0x3fcb8960 0x4202cb23:0x3fcb8980 0x420cdbb2:0x3fcb89a0 0x420cdf41:0x3fcb89d0 0x420ce257:0x3fcb8a00

ELF file SHA256: 9fc0483beb1c1f20
0x40383ccd: verify_allocated_region at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/heap/multi_heap_poisoning.c:116
0x40383b5c: multi_heap_free at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/heap/multi_heap_poisoning.c:258
0x4037863d: heap_caps_free at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/heap/heap_caps.c:382
0x40383f85: free at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/newlib/heap.c:39
0x420ca799: operator delete(void*) at /builds/idf/crosstool-NG/.build/xtensa-esp32s3-elf/src/gcc/libstdc++-v3/libsupc++/del_op.cc:49
0x4202d8a6: __gnu_cxx::new_allocator<std::_List_node<AsyncWebHeader> >::deallocate(std::_List_node<AsyncWebHeader>*, unsigned int) at /home/schlimmchen/.platformio/packages/toolchain-xtensa-esp32s3/xtensa-esp32s3-elf/include/c++/8.4.0/ext/new_allocator.h:125
 (inlined by) std::allocator_traits<std::allocator<std::_List_node<AsyncWebHeader> > >::deallocate(std::allocator<std::_List_node<AsyncWebHeader> >&, std::_List_node<AsyncWebHeader>*, unsigned int) at /home/schlimmchen/.platformio/packages/toolchain-xtensa-esp32s3/xtensa-esp32s3-elf/include/c++/8.4.0/bits/alloc_traits.h:462
 (inlined by) std::__cxx11::_List_base<AsyncWebHeader, std::allocator<AsyncWebHeader> >::_M_put_node(std::_List_node<AsyncWebHeader>*) at /home/schlimmchen/.platformio/packages/toolchain-xtensa-esp32s3/xtensa-esp32s3-elf/include/c++/8.4.0/bits/stl_list.h:454
 (inlined by) std::__cxx11::_List_base<AsyncWebHeader, std::allocator<AsyncWebHeader> >::_M_clear() at /home/schlimmchen/.platformio/packages/toolchain-xtensa-esp32s3/xtensa-esp32s3-elf/include/c++/8.4.0/bits/list.tcc:81
0x4202fd02: std::__cxx11::_List_base<AsyncWebHeader, std::allocator<AsyncWebHeader> >::~_List_base() at /home/schlimmchen/.platformio/packages/toolchain-xtensa-esp32s3/xtensa-esp32s3-elf/include/c++/8.4.0/bits/stl_list.h:507
 (inlined by) std::__cxx11::list<AsyncWebHeader, std::allocator<AsyncWebHeader> >::~list() at /home/schlimmchen/.platformio/packages/toolchain-xtensa-esp32s3/xtensa-esp32s3-elf/include/c++/8.4.0/bits/stl_list.h:835
 (inlined by) AsyncWebServerResponse::~AsyncWebServerResponse() at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/ESPAsyncWebServer/src/ESPAsyncWebServer.h:766
0x42027fda: AsyncAbstractResponse::~AsyncAbstractResponse() at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/ESPAsyncWebServer/src/WebResponseImpl.h:47
0x4202800a: AsyncJsonResponse::~AsyncJsonResponse() at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/ESPAsyncWebServer/src/AsyncJson.h:58
0x42028015: AsyncJsonResponse::~AsyncJsonResponse() at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/ESPAsyncWebServer/src/AsyncJson.h:58
0x4202efe6: AsyncWebServerRequest::~AsyncWebServerRequest() at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/ESPAsyncWebServer/src/WebRequest.cpp:53 (discriminator 1)
0x42030cad: AsyncWebServer::_handleDisconnect(AsyncWebServerRequest*) at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/ESPAsyncWebServer/src/WebServer.cpp:138 (discriminator 1)
0x4202cb17: AsyncWebServerRequest::_onDisconnect() at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/ESPAsyncWebServer/src/WebRequest.cpp:204
0x4202cb23: std::_Function_handler<void (void*, AsyncClient*), AsyncWebServerRequest::AsyncWebServerRequest(AsyncWebServer*, AsyncClient*)::{lambda(void*, AsyncClient*)#3}>::_M_invoke(std::_Any_data const&, void*&&, AsyncClient*&&) at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/ESPAsyncWebServer/src/WebRequest.cpp:41
 (inlined by) _M_invoke at /home/schlimmchen/.platformio/packages/toolchain-xtensa-esp32s3/xtensa-esp32s3-elf/include/c++/8.4.0/bits/std_function.h:297
0x420cdbb2: std::function<void (void*, AsyncClient*)>::operator()(void*, AsyncClient*) const at /home/schlimmchen/.platformio/packages/toolchain-xtensa-esp32s3/xtensa-esp32s3-elf/include/c++/8.4.0/bits/std_function.h:687
0x420cdf41: AsyncClient::_error(signed char) at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/AsyncTCP/src/AsyncTCP.cpp:945
0x420ce257: AsyncClient::_s_error(void*, signed char) at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/AsyncTCP/src/AsyncTCP.cpp:1391
 (inlined by) _handle_async_event at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/AsyncTCP/src/AsyncTCP.cpp:184
 (inlined by) _async_service_task at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/AsyncTCP/src/AsyncTCP.cpp:207

One more for good measure. Again, similar context. Maybe @mathieucarbou is willing to take a look?

@stefan123t
Copy link
Contributor

stefan123t commented Oct 22, 2024

Yeah maybe same context, the plot thickens as they say:

espasyncwebserver.h:766
webresponseimpl.h:47
asyncjson.h:58
asyncjson.h:58
WebRequest.cpp:53 (discriminator 1)
WebServer.cpp:138 (discriminator 1)
webrequest.cpp:204

While the webrequest seems to cope with AsyncWebServer::_handleDisconnect() during AsyncWebServerRequest::_onDisconnect()

Can we provoke this situation somehow?

I assume maybe something like a WLAN disconnect / Application Level Firewall or a tcpdump may tell us a bit more what happens on the network and where in the code / process this breaks.

@mathieucarbou
Copy link

mathieucarbou commented Oct 22, 2024

One more for good measure. Again, similar context. Maybe @mathieucarbou is willing to take a look?

I'm sorry I won't be of a great help on this at the moment (too few info)... Having a reproductible use case outside of the app would help.

Although we see AsyncTCP and ESPAsycnWS in the stack following user interaction it does not mean that the issue is there, but the normal processing of ESPAsyncWS could be impacted by other things running.

There are a lof of similar ones reported in https://github.com/espressif/esp-idf.

Questions, in the process to try isolate the cause:

  • is it a normal esp32dev board ?
  • you have made sure to not have heavy tasks running concurrently ?
  • no timers, interrupt handlers or code in running from iram ?
  • no psram ?
  • no heavy task allocating and deallocating on heap frequently ?
  • you checked the heap ?
  • how exactly this "save" work ? can you point to the code ? this is a normal upload handler saving a json config on disk ?
  • also to test: if this is fs related, same issue could happen during an OTA update then ? Or you are disabling some running task during an OTA to not disrupt it ?

@mathieucarbou
Copy link

Guru Meditation Error: Core  0 panic'ed (LoadProhibited). Exception was unhandled.

Core  0 register dump:
PC      : 0x4202d894  PS      : 0x00060b30  A0      : 0x8202fd05  A1      : 0x3fcb8900
A2      : 0x3fcbbcd8  A3      : 0x00000000  A4      : 0x00000000  A5      : 0x3fcb9e54
A6      : 0x3fcf0b24  A7      : 0x00000000  A8      : 0x8202d8a9  A9      : 0x3fcb88e0
A10     : 0x00000018  A11     : 0x3fcba658  A12     : 0x00000000  A13     : 0x00000001
A14     : 0x00060520  A15     : 0x00000001  SAR     : 0x0000001d  EXCCAUSE: 0x0000001c
EXCVADDR: 0x00000000  LBEG    : 0x400570e8  LEND    : 0x400570f3  LCOUNT  : 0xffffffff

Backtrace: 0x4202d891:0x3fcb8900 0x4202fd02:0x3fcb8920 0x42027fda:0x3fcb8940 0x4202800a:0x3fcb8960 0x42028015:0x3fcb8980 0x4202efe6:0x3fcb89a0 0x42030cad:0x3fcb89c0 0x4202cb17:0x3fcb89e0 0x4202cb23:0x3fcb8a00 0x420cdbb2:0x3fcb8a20 0x420cdf41:0x3fcb8a50 0x420ce257:0x3fcb8a80

ELF file SHA256: 9fc0483beb1c1f20

0x4202d891: AsyncWebHeader::~AsyncWebHeader() at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/ESPAsyncWebServer/src/ESPAsyncWebServer.h:137
 (inlined by) void __gnu_cxx::new_allocator<std::_List_node<AsyncWebHeader> >::destroy<AsyncWebHeader>(AsyncWebHeader*) at /home/schlimmchen/.platformio/packages/toolchain-xtensa-esp32s3/xtensa-esp32s3-elf/include/c++/8.4.0/ext/new_allocator.h:140
 (inlined by) void std::allocator_traits<std::allocator<std::_List_node<AsyncWebHeader> > >::destroy<AsyncWebHeader>(std::allocator<std::_List_node<AsyncWebHeader> >&, AsyncWebHeader*) at /home/schlimmchen/.platformio/packages/toolchain-xtensa-esp32s3/xtensa-esp32s3-elf/include/c++/8.4.0/bits/alloc_traits.h:487
 (inlined by) std::__cxx11::_List_base<AsyncWebHeader, std::allocator<AsyncWebHeader> >::_M_clear() at /home/schlimmchen/.platformio/packages/toolchain-xtensa-esp32s3/xtensa-esp32s3-elf/include/c++/8.4.0/bits/list.tcc:77
0x4202fd02: std::__cxx11::_List_base<AsyncWebHeader, std::allocator<AsyncWebHeader> >::~_List_base() at /home/schlimmchen/.platformio/packages/toolchain-xtensa-esp32s3/xtensa-esp32s3-elf/include/c++/8.4.0/bits/stl_list.h:507
 (inlined by) std::__cxx11::list<AsyncWebHeader, std::allocator<AsyncWebHeader> >::~list() at /home/schlimmchen/.platformio/packages/toolchain-xtensa-esp32s3/xtensa-esp32s3-elf/include/c++/8.4.0/bits/stl_list.h:835
 (inlined by) AsyncWebServerResponse::~AsyncWebServerResponse() at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/ESPAsyncWebServer/src/ESPAsyncWebServer.h:766
0x42027fda: AsyncAbstractResponse::~AsyncAbstractResponse() at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/ESPAsyncWebServer/src/WebResponseImpl.h:47
0x4202800a: AsyncJsonResponse::~AsyncJsonResponse() at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/ESPAsyncWebServer/src/AsyncJson.h:58
0x42028015: AsyncJsonResponse::~AsyncJsonResponse() at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/ESPAsyncWebServer/src/AsyncJson.h:58
0x4202efe6: AsyncWebServerRequest::~AsyncWebServerRequest() at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/ESPAsyncWebServer/src/WebRequest.cpp:53 (discriminator 1)
0x42030cad: AsyncWebServer::_handleDisconnect(AsyncWebServerRequest*) at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/ESPAsyncWebServer/src/WebServer.cpp:138 (discriminator 1)
0x4202cb17: AsyncWebServerRequest::_onDisconnect() at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/ESPAsyncWebServer/src/WebRequest.cpp:204
0x4202cb23: std::_Function_handler<void (void*, AsyncClient*), AsyncWebServerRequest::AsyncWebServerRequest(AsyncWebServer*, AsyncClient*)::{lambda(void*, AsyncClient*)#3}>::_M_invoke(std::_Any_data const&, void*&&, AsyncClient*&&) at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/ESPAsyncWebServer/src/WebRequest.cpp:41
 (inlined by) _M_invoke at /home/schlimmchen/.platformio/packages/toolchain-xtensa-esp32s3/xtensa-esp32s3-elf/include/c++/8.4.0/bits/std_function.h:297
0x420cdbb2: std::function<void (void*, AsyncClient*)>::operator()(void*, AsyncClient*) const at /home/schlimmchen/.platformio/packages/toolchain-xtensa-esp32s3/xtensa-esp32s3-elf/include/c++/8.4.0/bits/std_function.h:687
0x420cdf41: AsyncClient::_error(signed char) at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/AsyncTCP/src/AsyncTCP.cpp:945
0x420ce257: AsyncClient::_s_error(void*, signed char) at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/AsyncTCP/src/AsyncTCP.cpp:1391
 (inlined by) _handle_async_event at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/AsyncTCP/src/AsyncTCP.cpp:184
 (inlined by) _async_service_task at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/AsyncTCP/src/AsyncTCP.cpp:207

Nope... Got this on the first try using v24.10.15 including the write guard.

The backtrace seems to suggest that the Async TCP Server can't handle the underlying network connection breaking, or to be more specific: Once the connection breaks, handling the disconnect triggers some kind of issue.

assert failed: tlsf_free heap_tlsf.c:965 (!block_is_free(block) && "block already marked as free")

Backtrace: 0x40377dc6:0x3fcb8650 0x4037d241:0x3fcb8670 0x40383f55:0x3fcb8690 0x4038307d:0x3fcb87c0 0x40383a48:0x3fcb87e0 0x40383b76:0x3fcb8800 0x4037863d:0x3fcb8820 0x40383f85:0x3fcb8840 0x420ca799:0x3fcb8860 0x4202d8a6:0x3fcb8880 0x4202fd02:0x3fcb88a0 0x42027fda:0x3fcb88c0 0x4202800a:0x3fcb88e0 0x42028015:0x3fcb8900 0x4202efe6:0x3fcb8920 0x42030cad:0x3fcb8940 0x4202cb17:0x3fcb8960 0x4202cb23:0x3fcb8980 0x420cdbb2:0x3fcb89a0 0x420cdf41:0x3fcb89d0 0x420ce257:0x3fcb8a00

ELF file SHA256: 9fc0483beb1c1f20
0x40377dc6: panic_abort at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/esp_system/panic.c:408
0x4037d241: esp_system_abort at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/esp_system/esp_system.c:137
0x40383f55: __assert_func at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/newlib/assert.c:85
0x4038307d: block_merge_prev at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/heap/heap_tlsf.c:343
 (inlined by) tlsf_free at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/heap/heap_tlsf.c:967
0x40383a48: multi_heap_free_impl at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/heap/multi_heap.c:212
0x40383b76: multi_heap_free at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/heap/multi_heap_poisoning.c:266
0x4037863d: heap_caps_free at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/heap/heap_caps.c:382
0x40383f85: free at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/newlib/heap.c:39
0x420ca799: operator delete(void*) at /builds/idf/crosstool-NG/.build/xtensa-esp32s3-elf/src/gcc/libstdc++-v3/libsupc++/del_op.cc:49
0x4202d8a6: __gnu_cxx::new_allocator<std::_List_node<AsyncWebHeader> >::deallocate(std::_List_node<AsyncWebHeader>*, unsigned int) at /home/schlimmchen/.platformio/packages/toolchain-xtensa-esp32s3/xtensa-esp32s3-elf/include/c++/8.4.0/ext/new_allocator.h:125
 (inlined by) std::allocator_traits<std::allocator<std::_List_node<AsyncWebHeader> > >::deallocate(std::allocator<std::_List_node<AsyncWebHeader> >&, std::_List_node<AsyncWebHeader>*, unsigned int) at /home/schlimmchen/.platformio/packages/toolchain-xtensa-esp32s3/xtensa-esp32s3-elf/include/c++/8.4.0/bits/alloc_traits.h:462
 (inlined by) std::__cxx11::_List_base<AsyncWebHeader, std::allocator<AsyncWebHeader> >::_M_put_node(std::_List_node<AsyncWebHeader>*) at /home/schlimmchen/.platformio/packages/toolchain-xtensa-esp32s3/xtensa-esp32s3-elf/include/c++/8.4.0/bits/stl_list.h:454
 (inlined by) std::__cxx11::_List_base<AsyncWebHeader, std::allocator<AsyncWebHeader> >::_M_clear() at /home/schlimmchen/.platformio/packages/toolchain-xtensa-esp32s3/xtensa-esp32s3-elf/include/c++/8.4.0/bits/list.tcc:81
0x4202fd02: std::__cxx11::_List_base<AsyncWebHeader, std::allocator<AsyncWebHeader> >::~_List_base() at /home/schlimmchen/.platformio/packages/toolchain-xtensa-esp32s3/xtensa-esp32s3-elf/include/c++/8.4.0/bits/stl_list.h:507
 (inlined by) std::__cxx11::list<AsyncWebHeader, std::allocator<AsyncWebHeader> >::~list() at /home/schlimmchen/.platformio/packages/toolchain-xtensa-esp32s3/xtensa-esp32s3-elf/include/c++/8.4.0/bits/stl_list.h:835
 (inlined by) AsyncWebServerResponse::~AsyncWebServerResponse() at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/ESPAsyncWebServer/src/ESPAsyncWebServer.h:766
0x42027fda: AsyncAbstractResponse::~AsyncAbstractResponse() at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/ESPAsyncWebServer/src/WebResponseImpl.h:47
0x4202800a: AsyncJsonResponse::~AsyncJsonResponse() at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/ESPAsyncWebServer/src/AsyncJson.h:58
0x42028015: AsyncJsonResponse::~AsyncJsonResponse() at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/ESPAsyncWebServer/src/AsyncJson.h:58
0x4202efe6: AsyncWebServerRequest::~AsyncWebServerRequest() at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/ESPAsyncWebServer/src/WebRequest.cpp:53 (discriminator 1)
0x42030cad: AsyncWebServer::_handleDisconnect(AsyncWebServerRequest*) at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/ESPAsyncWebServer/src/WebServer.cpp:138 (discriminator 1)
0x4202cb17: AsyncWebServerRequest::_onDisconnect() at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/ESPAsyncWebServer/src/WebRequest.cpp:204
0x4202cb23: std::_Function_handler<void (void*, AsyncClient*), AsyncWebServerRequest::AsyncWebServerRequest(AsyncWebServer*, AsyncClient*)::{lambda(void*, AsyncClient*)#3}>::_M_invoke(std::_Any_data const&, void*&&, AsyncClient*&&) at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/ESPAsyncWebServer/src/WebRequest.cpp:41
 (inlined by) _M_invoke at /home/schlimmchen/.platformio/packages/toolchain-xtensa-esp32s3/xtensa-esp32s3-elf/include/c++/8.4.0/bits/std_function.h:297
0x420cdbb2: std::function<void (void*, AsyncClient*)>::operator()(void*, AsyncClient*) const at /home/schlimmchen/.platformio/packages/toolchain-xtensa-esp32s3/xtensa-esp32s3-elf/include/c++/8.4.0/bits/std_function.h:687
0x420cdf41: AsyncClient::_error(signed char) at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/AsyncTCP/src/AsyncTCP.cpp:945
0x420ce257: AsyncClient::_s_error(void*, signed char) at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/AsyncTCP/src/AsyncTCP.cpp:1391
 (inlined by) _handle_async_event at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/AsyncTCP/src/AsyncTCP.cpp:184
 (inlined by) _async_service_task at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/AsyncTCP/src/AsyncTCP.cpp:207

Hm, similar context. Maybe a double free?

Guru Meditation Error: Core  0 panic'ed (LoadStoreError). Exception was unhandled.

Core  0 register dump:
PC      : 0x40383cd0  PS      : 0x00060133  A0      : 0x80383b5f  A1      : 0x3fcb87d0
A2      : 0x3fcade10  A3      : 0x00000001  A4      : 0x00000004  A5      : 0xbaad5678
A6      : 0x00060920  A7      : 0x00000001  A8      : 0x43b9e1ff  A9      : 0x00000003
A10     : 0x00000001  A11     : 0x3fcade08  A12     : 0xabba1234  A13     : 0xabba1234
A14     : 0x00060120  A15     : 0x00000001  SAR     : 0x00000015  EXCCAUSE: 0x00000003
EXCVADDR: 0x43b9e1ff  LBEG    : 0x400570e8  LEND    : 0x400570f3  LCOUNT  : 0xffffffff

Backtrace: 0x40383ccd:0x3fcb87d0 0x40383b5c:0x3fcb8800 0x4037863d:0x3fcb8820 0x40383f85:0x3fcb8840 0x420ca799:0x3fcb8860 0x4202d8a6:0x3fcb8880 0x4202fd02:0x3fcb88a0 0x42027fda:0x3fcb88c0 0x4202800a:0x3fcb88e0 0x42028015:0x3fcb8900 0x4202efe6:0x3fcb8920 0x42030cad:0x3fcb8940 0x4202cb17:0x3fcb8960 0x4202cb23:0x3fcb8980 0x420cdbb2:0x3fcb89a0 0x420cdf41:0x3fcb89d0 0x420ce257:0x3fcb8a00

ELF file SHA256: 9fc0483beb1c1f20
0x40383ccd: verify_allocated_region at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/heap/multi_heap_poisoning.c:116
0x40383b5c: multi_heap_free at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/heap/multi_heap_poisoning.c:258
0x4037863d: heap_caps_free at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/heap/heap_caps.c:382
0x40383f85: free at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/newlib/heap.c:39
0x420ca799: operator delete(void*) at /builds/idf/crosstool-NG/.build/xtensa-esp32s3-elf/src/gcc/libstdc++-v3/libsupc++/del_op.cc:49
0x4202d8a6: __gnu_cxx::new_allocator<std::_List_node<AsyncWebHeader> >::deallocate(std::_List_node<AsyncWebHeader>*, unsigned int) at /home/schlimmchen/.platformio/packages/toolchain-xtensa-esp32s3/xtensa-esp32s3-elf/include/c++/8.4.0/ext/new_allocator.h:125
 (inlined by) std::allocator_traits<std::allocator<std::_List_node<AsyncWebHeader> > >::deallocate(std::allocator<std::_List_node<AsyncWebHeader> >&, std::_List_node<AsyncWebHeader>*, unsigned int) at /home/schlimmchen/.platformio/packages/toolchain-xtensa-esp32s3/xtensa-esp32s3-elf/include/c++/8.4.0/bits/alloc_traits.h:462
 (inlined by) std::__cxx11::_List_base<AsyncWebHeader, std::allocator<AsyncWebHeader> >::_M_put_node(std::_List_node<AsyncWebHeader>*) at /home/schlimmchen/.platformio/packages/toolchain-xtensa-esp32s3/xtensa-esp32s3-elf/include/c++/8.4.0/bits/stl_list.h:454
 (inlined by) std::__cxx11::_List_base<AsyncWebHeader, std::allocator<AsyncWebHeader> >::_M_clear() at /home/schlimmchen/.platformio/packages/toolchain-xtensa-esp32s3/xtensa-esp32s3-elf/include/c++/8.4.0/bits/list.tcc:81
0x4202fd02: std::__cxx11::_List_base<AsyncWebHeader, std::allocator<AsyncWebHeader> >::~_List_base() at /home/schlimmchen/.platformio/packages/toolchain-xtensa-esp32s3/xtensa-esp32s3-elf/include/c++/8.4.0/bits/stl_list.h:507
 (inlined by) std::__cxx11::list<AsyncWebHeader, std::allocator<AsyncWebHeader> >::~list() at /home/schlimmchen/.platformio/packages/toolchain-xtensa-esp32s3/xtensa-esp32s3-elf/include/c++/8.4.0/bits/stl_list.h:835
 (inlined by) AsyncWebServerResponse::~AsyncWebServerResponse() at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/ESPAsyncWebServer/src/ESPAsyncWebServer.h:766
0x42027fda: AsyncAbstractResponse::~AsyncAbstractResponse() at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/ESPAsyncWebServer/src/WebResponseImpl.h:47
0x4202800a: AsyncJsonResponse::~AsyncJsonResponse() at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/ESPAsyncWebServer/src/AsyncJson.h:58
0x42028015: AsyncJsonResponse::~AsyncJsonResponse() at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/ESPAsyncWebServer/src/AsyncJson.h:58
0x4202efe6: AsyncWebServerRequest::~AsyncWebServerRequest() at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/ESPAsyncWebServer/src/WebRequest.cpp:53 (discriminator 1)
0x42030cad: AsyncWebServer::_handleDisconnect(AsyncWebServerRequest*) at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/ESPAsyncWebServer/src/WebServer.cpp:138 (discriminator 1)
0x4202cb17: AsyncWebServerRequest::_onDisconnect() at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/ESPAsyncWebServer/src/WebRequest.cpp:204
0x4202cb23: std::_Function_handler<void (void*, AsyncClient*), AsyncWebServerRequest::AsyncWebServerRequest(AsyncWebServer*, AsyncClient*)::{lambda(void*, AsyncClient*)#3}>::_M_invoke(std::_Any_data const&, void*&&, AsyncClient*&&) at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/ESPAsyncWebServer/src/WebRequest.cpp:41
 (inlined by) _M_invoke at /home/schlimmchen/.platformio/packages/toolchain-xtensa-esp32s3/xtensa-esp32s3-elf/include/c++/8.4.0/bits/std_function.h:297
0x420cdbb2: std::function<void (void*, AsyncClient*)>::operator()(void*, AsyncClient*) const at /home/schlimmchen/.platformio/packages/toolchain-xtensa-esp32s3/xtensa-esp32s3-elf/include/c++/8.4.0/bits/std_function.h:687
0x420cdf41: AsyncClient::_error(signed char) at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/AsyncTCP/src/AsyncTCP.cpp:945
0x420ce257: AsyncClient::_s_error(void*, signed char) at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/AsyncTCP/src/AsyncTCP.cpp:1391
 (inlined by) _handle_async_event at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/AsyncTCP/src/AsyncTCP.cpp:184
 (inlined by) _async_service_task at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/AsyncTCP/src/AsyncTCP.cpp:207

One more for good measure. Again, similar context. Maybe @mathieucarbou is willing to take a look?

These stack traces are nearly the same:

  • lwip layer sends LWIP_TCP_ERROR event

  • which triggers AsyncTCP error handler

  • which triggers ESPAsyncWS disconnect handler, which:

    1. delete the request (in _handleDisconnect())
    2. then delete the client (in the onDisconnect cb)

    The failure always happens when freeing the response attached to the request:

AsyncWebServerRequest::~AsyncWebServerRequest() {
  _headers.clear();

  _pathParams.clear();

  if (_response != NULL) {
    delete _response;
  }

For that to happen, the created response pointer (in this case AsyncJsonResponse ) has to be sent already (request->send(response);), because this is the step that is attaching the response pointer to he request object.

So the request->send(response); was called, then the lwip error occurred.

This is important to note that request->send(response); does not send the request on the network!, because the middleware chain continues to be processed (some middleware can act on the response headers or decide to replace the response attached to a request). The commit on the network is done once the middleware chain is finished.

This is important to consider because some code like this is brittle and relied probably on the fact that the response was sent and received by the browser, which is not the case:

In WebApi_config.cpp:

    WebApi.sendJsonResponse(request, response, __FUNCTION__, __LINE__);

    Utils::removeAllFiles();
    RestartHelper.triggerRestart();

The 2 list lines will be executed, and the request will be sent once the middleware chain and request handler have finished.

Another one:

    WebApi.sendJsonResponse(request, response, __FUNCTION__, __LINE__);

    Battery.updateSettings();
    MqttHandleBatteryHass.forceUpdate();

    // potentially make SoC thresholds auto-discoverable
    MqttHandlePowerLimiterHass.forceUpdate();

Here, the last lines will execute BEFORE the request is sent.

All the things executed AFTER a response is attached to a request have to be carefully written in order to not have some impacts on any pointer the response could still reference.

In the case of a json response, in some use cases, ArduinoJson won't do a copy but just point to the existing pointers (but this is not the issue we saw).

In the issue we saw above, the lwip layer fails (wifi disconnect or something else) which triggers the response deletion.

  • It would be helpful to put some logs in the OpendTU handlers to know which handler created the response which fails to be deleted

  • It also would be helpful to put some logs in the response destructor to check whether it is called twice and from who if possible.

  • It would be helpful to put some logs before delete is called on a response. There is only some places where a response is deleted:

  • In void AsyncWebServerRequest::send(AsyncWebServerResponse* response) { (when a response is replaced by another one)

  • In AsyncWebServerRequest::~AsyncWebServerRequest() {

  • In these 2 places:

void AsyncWebServerRequest::_onPoll() {
  // os_printf("p\n");
  if (_response != NULL && _client != NULL && _client->canSend()) {
    if (!_response->_finished()) {
      _response->_ack(this, 0, 0);
    } else {
      AsyncWebServerResponse* r = _response;
      _response = NULL;
      delete r;

      _client->close();
    }
  }
}

void AsyncWebServerRequest::_onAck(size_t len, uint32_t time) {
  // os_printf("a:%u:%u\n", len, time);
  if (_response != NULL) {
    if (!_response->_finished()) {
      _response->_ack(this, len, time);
    } else if (_response->_finished()) {
      AsyncWebServerResponse* r = _response;
      _response = NULL;
      delete r;

      _client->close();
    }
  }
}
``

I suspect this might be the issue...

@mathieucarbou
Copy link

mathieucarbou commented Oct 23, 2024

@schlimmchen :

In WebRequest.cpp: could you please add, just after the delete: _response = NULL;

AsyncWebServerRequest::~AsyncWebServerRequest() {
  _headers.clear();

  _pathParams.clear();

  if (_response != NULL) {
    delete _response;
    _response = NULL;
  }

or (better):

  AsyncWebServerResponse* r = _response
  _response = NULL;
  delete r;

To be sure the response is not used by any of the callbacks above.

@schlimmchen
Copy link
Contributor Author

Thanks for looking into this. I see you spent quite some of your time, thanks!

I could not yet fully understand your longer comment. I'll re-read it again later. What I do understand is that one has to be careful when sending a response (actually queuing sending a response) but executing code in the same context, which runs before the response was actually sent. I know about the example you gave, where the ESP is restarted. I did not dare to question it, but indeed it seems that sending the response and restarting the ESP (be it with a delay or not) is something of a race. What we would actually like to do is wait for the response to be sent, then trigger the reboot.

The reason I asked whether you would want to have a look is that I suspect that you are interested in making ESPAsyncWebServer rebust against the issue we are looking at, even if we are using it in a questionable manner. Assuming that something can be done in the lib...

The changes you proposed unfortunatly don't prevent the issue from being triggered.

[SysCORRUPT HEAP: Bad tail at 0x44862726. Expected 0xbaad5678 got 0x00000000

assert failed: multi_heap_free multi_heap_poisoning.c:259 (head != NULL)

Backtrace: 0x403780f2:0x3fcbf7f0 0x4037d651:0x3fcbf810 0x4038441d:0x3fcbf830 0x40384035:0x3fcbf960 0x40378969:0x3fcbf980 0x4038444d:0x3fcbf9a0 0x421063e1:0x3fcbf9c0 0x4205d64e:0x3fcbf9e0 0x4205fa3e:0x3fcbfa00 0x42057a1e:0x3fcbfa20 0x42057a4e:0x3fcbfa40 0x42057a59:0x3fcbfa60 0x4205ed7a:0x3fcbfa80 0x420609e9:0x3fcbfaa0 0x4205c8bf:0x3fcbfac0 0x4205c8cb:0x3fcbfae0 0x42109af6:0x3fcbfb00 0x42109e85:0x3fcbfb30 0x4210a19b:0x3fcbfb60

ELF file SHA256: 5c3b1417a0d70e10
0x403780f2: panic_abort at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/esp_system/panic.c:408
0x4037d651: esp_system_abort at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/esp_system/esp_system.c:137
0x4038441d: __assert_func at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/newlib/assert.c:85
0x40384035: multi_heap_free at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/heap/multi_heap_poisoning.c:259 (discriminator 1)
0x40378969: heap_caps_free at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/heap/heap_caps.c:382
0x4038444d: free at /home/runner/work/esp32-arduino-lib-builder/esp32-arduino-lib-builder/esp-idf/components/newlib/heap.c:39
0x421063e1: operator delete(void*) at /builds/idf/crosstool-NG/.build/xtensa-esp32s3-elf/src/gcc/libstdc++-v3/libsupc++/del_op.cc:49
0x4205d64e: __gnu_cxx::new_allocator<std::_List_node<AsyncWebHeader> >::deallocate(std::_List_node<AsyncWebHeader>*, unsigned int) at /home/schlimmchen/.platformio/packages/toolchain-xtensa-esp32s3/xtensa-esp32s3-elf/include/c++/8.4.0/ext/new_allocator.h:125
 (inlined by) std::allocator_traits<std::allocator<std::_List_node<AsyncWebHeader> > >::deallocate(std::allocator<std::_List_node<AsyncWebHeader> >&, std::_List_node<AsyncWebHeader>*, unsigned int) at /home/schlimmchen/.platformio/packages/toolchain-xtensa-esp32s3/xtensa-esp32s3-elf/include/c++/8.4.0/bits/alloc_traits.h:462
 (inlined by) std::__cxx11::_List_base<AsyncWebHeader, std::allocator<AsyncWebHeader> >::_M_put_node(std::_List_node<AsyncWebHeader>*) at /home/schlimmchen/.platformio/packages/toolchain-xtensa-esp32s3/xtensa-esp32s3-elf/include/c++/8.4.0/bits/stl_list.h:454
 (inlined by) std::__cxx11::_List_base<AsyncWebHeader, std::allocator<AsyncWebHeader> >::_M_clear() at /home/schlimmchen/.platformio/packages/toolchain-xtensa-esp32s3/xtensa-esp32s3-elf/include/c++/8.4.0/bits/list.tcc:81
0x4205fa3e: std::__cxx11::_List_base<AsyncWebHeader, std::allocator<AsyncWebHeader> >::~_List_base() at /home/schlimmchen/.platformio/packages/toolchain-xtensa-esp32s3/xtensa-esp32s3-elf/include/c++/8.4.0/bits/stl_list.h:507
 (inlined by) std::__cxx11::list<AsyncWebHeader, std::allocator<AsyncWebHeader> >::~list() at /home/schlimmchen/.platformio/packages/toolchain-xtensa-esp32s3/xtensa-esp32s3-elf/include/c++/8.4.0/bits/stl_list.h:835
 (inlined by) AsyncWebServerResponse::~AsyncWebServerResponse() at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/ESPAsyncWebServer/src/ESPAsyncWebServer.h:766
0x42057a1e: AsyncAbstractResponse::~AsyncAbstractResponse() at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/ESPAsyncWebServer/src/WebResponseImpl.h:47
0x42057a4e: AsyncJsonResponse::~AsyncJsonResponse() at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/ESPAsyncWebServer/src/AsyncJson.h:58
0x42057a59: AsyncJsonResponse::~AsyncJsonResponse() at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/ESPAsyncWebServer/src/AsyncJson.h:58
0x4205ed7a: AsyncWebServerRequest::~AsyncWebServerRequest() at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/ESPAsyncWebServer/src/WebRequest.cpp:55
0x420609e9: AsyncWebServer::_handleDisconnect(AsyncWebServerRequest*) at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/ESPAsyncWebServer/src/WebServer.cpp:138 (discriminator 1)
0x4205c8bf: AsyncWebServerRequest::_onDisconnect() at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/ESPAsyncWebServer/src/WebRequest.cpp:206
0x4205c8cb: std::_Function_handler<void (void*, AsyncClient*), AsyncWebServerRequest::AsyncWebServerRequest(AsyncWebServer*, AsyncClient*)::{lambda(void*, AsyncClient*)#3}>::_M_invoke(std::_Any_data const&, void*&&, AsyncClient*&&) at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/ESPAsyncWebServer/src/WebRequest.cpp:41
 (inlined by) _M_invoke at /home/schlimmchen/.platformio/packages/toolchain-xtensa-esp32s3/xtensa-esp32s3-elf/include/c++/8.4.0/bits/std_function.h:297
0x42109af6: std::function<void (void*, AsyncClient*)>::operator()(void*, AsyncClient*) const at /home/schlimmchen/.platformio/packages/toolchain-xtensa-esp32s3/xtensa-esp32s3-elf/include/c++/8.4.0/bits/std_function.h:687
0x42109e85: AsyncClient::_error(signed char) at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/AsyncTCP/src/AsyncTCP.cpp:945
0x4210a19b: AsyncClient::_s_error(void*, signed char) at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/AsyncTCP/src/AsyncTCP.cpp:1391
 (inlined by) _handle_async_event at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/AsyncTCP/src/AsyncTCP.cpp:184
 (inlined by) _async_service_task at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/AsyncTCP/src/AsyncTCP.cpp:207
Guru Meditation Error: Core  1 panic'ed (LoadProhibited). Exception was unhandled.

Core  1 register dump:
PC      : 0x4205d63c  PS      : 0x00060630  A0      : 0x8205fa45  A1      : 0x3fcbfa30
A2      : 0x3fcc1ce0  A3      : 0x00000000  A4      : 0x00000000  A5      : 0x3fcc1ff8
A6      : 0x3fcf0b24  A7      : 0x00000000  A8      : 0x8205d651  A9      : 0x3fcbfa10
A10     : 0x00000018  A11     : 0x3fcc11d4  A12     : 0x00000000  A13     : 0x00000001
A14     : 0x00060020  A15     : 0x00000001  SAR     : 0x00000015  EXCCAUSE: 0x0000001c
EXCVADDR: 0x00000000  LBEG    : 0x400570e8  LEND    : 0x400570f3  LCOUNT  : 0xffffffff

Backtrace: 0x4205d639:0x3fcbfa30 0x4205fa42:0x3fcbfa50 0x42057a1e:0x3fcbfa70 0x42057a4e:0x3fcbfa90 0x42057a59:0x3fcbfab0 0x4205ed7a:0x3fcbfad0 0x420609ed:0x3fcbfaf0 0x4205c8bf:0x3fcbfb10 0x4205c8cb:0x3fcbfb30 0x42109afa:0x3fcbfb50 0x42109e89:0x3fcbfb80 0x4210a19f:0x3fcbfbb0

ELF file SHA256: 4fbad0611da5a1f1
0x4205d639: AsyncWebHeader::~AsyncWebHeader() at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/ESPAsyncWebServer/src/ESPAsyncWebServer.h:137
 (inlined by) void __gnu_cxx::new_allocator<std::_List_node<AsyncWebHeader> >::destroy<AsyncWebHeader>(AsyncWebHeader*) at /home/schlimmchen/.platformio/packages/toolchain-xtensa-esp32s3/xtensa-esp32s3-elf/include/c++/8.4.0/ext/new_allocator.h:140
 (inlined by) void std::allocator_traits<std::allocator<std::_List_node<AsyncWebHeader> > >::destroy<AsyncWebHeader>(std::allocator<std::_List_node<AsyncWebHeader> >&, AsyncWebHeader*) at /home/schlimmchen/.platformio/packages/toolchain-xtensa-esp32s3/xtensa-esp32s3-elf/include/c++/8.4.0/bits/alloc_traits.h:487
 (inlined by) std::__cxx11::_List_base<AsyncWebHeader, std::allocator<AsyncWebHeader> >::_M_clear() at /home/schlimmchen/.platformio/packages/toolchain-xtensa-esp32s3/xtensa-esp32s3-elf/include/c++/8.4.0/bits/list.tcc:77
0x4205fa42: std::__cxx11::_List_base<AsyncWebHeader, std::allocator<AsyncWebHeader> >::~_List_base() at /home/schlimmchen/.platformio/packages/toolchain-xtensa-esp32s3/xtensa-esp32s3-elf/include/c++/8.4.0/bits/stl_list.h:507
 (inlined by) std::__cxx11::list<AsyncWebHeader, std::allocator<AsyncWebHeader> >::~list() at /home/schlimmchen/.platformio/packages/toolchain-xtensa-esp32s3/xtensa-esp32s3-elf/include/c++/8.4.0/bits/stl_list.h:835
 (inlined by) AsyncWebServerResponse::~AsyncWebServerResponse() at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/ESPAsyncWebServer/src/ESPAsyncWebServer.h:766
0x42057a1e: AsyncAbstractResponse::~AsyncAbstractResponse() at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/ESPAsyncWebServer/src/WebResponseImpl.h:47
0x42057a4e: AsyncJsonResponse::~AsyncJsonResponse() at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/ESPAsyncWebServer/src/AsyncJson.h:58
0x42057a59: AsyncJsonResponse::~AsyncJsonResponse() at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/ESPAsyncWebServer/src/AsyncJson.h:58
0x4205ed7a: AsyncWebServerRequest::~AsyncWebServerRequest() at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/ESPAsyncWebServer/src/WebRequest.cpp:53 (discriminator 1)
0x420609ed: AsyncWebServer::_handleDisconnect(AsyncWebServerRequest*) at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/ESPAsyncWebServer/src/WebServer.cpp:138 (discriminator 1)
0x4205c8bf: AsyncWebServerRequest::_onDisconnect() at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/ESPAsyncWebServer/src/WebRequest.cpp:207
0x4205c8cb: std::_Function_handler<void (void*, AsyncClient*), AsyncWebServerRequest::AsyncWebServerRequest(AsyncWebServer*, AsyncClient*)::{lambda(void*, AsyncClient*)#3}>::_M_invoke(std::_Any_data const&, void*&&, AsyncClient*&&) at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/ESPAsyncWebServer/src/WebRequest.cpp:41
 (inlined by) _M_invoke at /home/schlimmchen/.platformio/packages/toolchain-xtensa-esp32s3/xtensa-esp32s3-elf/include/c++/8.4.0/bits/std_function.h:297
0x42109afa: std::function<void (void*, AsyncClient*)>::operator()(void*, AsyncClient*) const at /home/schlimmchen/.platformio/packages/toolchain-xtensa-esp32s3/xtensa-esp32s3-elf/include/c++/8.4.0/bits/std_function.h:687
0x42109e89: AsyncClient::_error(signed char) at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/AsyncTCP/src/AsyncTCP.cpp:945
0x4210a19f: AsyncClient::_s_error(void*, signed char) at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/AsyncTCP/src/AsyncTCP.cpp:1391
 (inlined by) _handle_async_event at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/AsyncTCP/src/AsyncTCP.cpp:184
 (inlined by) _async_service_task at /home/schlimmchen/Documents/OpenDTU-OnBattery/.pio/libdeps/generic_esp32s3_usb/AsyncTCP/src/AsyncTCP.cpp:207

For the record: I edited the file .pio/libdeps/generic_esp32s3_usb/ESPAsyncWebServer/src/WebRequest.cpp in-place and re-compiled, which gave me:

[...]
Building in release mode
Compiling .pio/build/generic_esp32s3_usb/lib487/ESPAsyncWebServer/WebRequest.cpp.o
Archiving .pio/build/generic_esp32s3_usb/lib487/libESPAsyncWebServer.a
Indexing .pio/build/generic_esp32s3_usb/lib487/libESPAsyncWebServer.a
Linking .pio/build/generic_esp32s3_usb/firmware.elf
Retrieving maximum program size .pio/build/generic_esp32s3_usb/firmware.elf
[...]

I assume that's okay for a quick check?

@mathieucarbou
Copy link

I assume that's okay for a quick check?

Yes! That will do.

In AsyncWebServer::_handleDisconnect: could you please add some log_e() or ESP_LOGE calls to log some info about the request object: method, path for example. To know which request it is, from which handler.

Except if you know it already ?

@mathieucarbou
Copy link

I have released mathieucarbou/ESPAsyncWebServer @ 3.3.19 which includes a bunch of cleanup around file hierarchy and virtual destructors but pretty sure it won't impact the issue you saw.

Let mw know when you'll have more logs to pinpoint what causes the issue :-)

@stefan123t
Copy link
Contributor

@mathieucarbou thanks for your pointers to look closely at, this is my little bed-time crime story 🔍 🧐 🎩 for the evening

I am still trying to understand what exactly happens,
when you make that copy of the pointer to the _response object,
set it to NULL and then delete the reference ?

  AsyncWebServerResponse* r = _response
  _response = NULL;
  delete r;

You do that exactly the same way in the following three locations:

But before you send it you do something else:

void AsyncWebServerRequest::send(AsyncWebServerResponse* response) {
  if (_sent)
    return;
  if (_response)
    delete _response;
  _response = response;
  if (_response == NULL) {
    _client->close(true);
    _onDisconnect();
    _sent = true;
    return;
  }
  if (!_response->_sourceValid())
    send(500);
}

I also noticed that it always complains about being unable to free the heap memory for a std::list item of type _List_base<AsyncWebHeader> _headers when trying to call the List's destructor ::~list() of the AsyncWebServerResponse.

    std::list<AsyncWebHeader> _headers;

Though I found only explicit code for clearing the memory of such a std::list<AsyncWebHeader> _headers in the AsyncWebServerRequest.

    // Remove a header from the request.
    // It will free the memory and prevent the header to be seen during request processing.
    bool removeHeader(const char* name);
    // Remove all request headers.
    void removeHeaders() { _headers.clear(); }

Could it be that we are left with a dangling pointer to this / actually no valid AsyncWebHeader list through the above NULL and delete operations ?

@mathieucarbou
Copy link

mathieucarbou commented Oct 23, 2024

set it to NULL and then delete the reference ?

we are in the destructor, so the idea is to set the ref to the response to null ASAP in case we have some code elsewhere that could still see this pointer (this is the case for the 2 other callbacks).

Then, once this is done, we can trigger the object deletion (which can take some time), but at least during this time the response in the request will be null.

You do that exactly the same way in the following three locations:

Yes, this is to have the pointer set to null asap, then free after.

void AsyncWebServerRequest::send(AsyncWebServerResponse* response) {
  if (_sent)
    return;
  if (_response)
    delete _response;
  _response = response;
  if (_response == NULL) {
    _client->close(true);
    _onDisconnect();
    _sent = true;
    return;
  }
  if (!_response->_sourceValid())
    send(500);
}

This code is just a feature to swap the response by another one. You are not using that. A middleware could decide to change a response that was set by a handler. So if a response was set, we delete it, then set the new one. This operation happens during the middleware chain processing (just after the handler and before the requests is sent on the network). So it is not the cause of the issue here.

I also noticed that it always complains about being unable to free the heap memory for a std::list item of type _List_base _headers when trying to call the List's destructor ::~list() of the AsyncWebServerResponse.

That is exactly what I also find strange...
Deleting the response object is a long process (as you see in the call stack), so I suspect that somewhere another deletion is triggered... That's my only explanation for now because this list: std::list<AsyncWebHeader> _headers; is really a dumb list, as dumb as having std::list<String> _headers;. AsyncWebHeader is just a holder object with 2 strings inside.

Could it be that we are left with a dangling pointer to this / actually no valid AsyncWebHeader list through the above NULL and delete operations ?

Not in the case of this list: AsyncWebHeader does not need to any destructor because this is a holder object of 2 strings and the list is storing objects by copying the value in a new instance held in the node (which is freed at node destruction).

But the way a linked list work, is by pointing to the next structure, so when the object is freed from memory, each node are freed and this is a longer operation, compared to just remove an array from memory if we got a vector instead. The issue with a vector is that it requires reallocation.

@schlimmchen should try to add some logs to know what is being executed (which request when it fails and also log before the delete calls on a response), also, point to the new version :-) We will have more info.

It is possible that when the lwip layer sends an error (following network issue, com broken, etc) then there is a concurrency issue happening.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants