Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ESP32S3: crash when subscribing with phone but not laptop #106

Open
ArdenKolodner opened this issue Jan 5, 2023 · 9 comments
Open

ESP32S3: crash when subscribing with phone but not laptop #106

ArdenKolodner opened this issue Jan 5, 2023 · 9 comments

Comments

@ArdenKolodner
Copy link

I have esp-nimble-cpp running on an ESP32S3 from ESP-IDF with Arduino component. I have a characteristic that should be updating and notifying about 3 times per second. I have a MacBook and iOS phone available for testing (waiting for a coworker to test this on PC/Android). When I subscribe to the characteristic from the MacBook, it works fine. When I do so from the iPhone, the ESP32S3 crashes, with the following stack trace:
`assert failed: static int NimBLECharacteristic::handleGapEvent(uint16_t, uint16_t, ble_gatt_access_ctxt*, void*) NimBLECharacteristic.cpp:277 (rc == 0)

Backtrace: [...]
0x40375cde: panic_abort at [idf path]/esp-idf/components/esp_system/panic.c:402

0x4037f9ed: esp_system_abort at [idf path]/esp-idf/components/esp_system/esp_system.c:128

0x40386359: __assert_func at [idf path]/esp-idf/components/newlib/assert.c:85

0x42025357: NimBLECharacteristic::handleGapEvent(unsigned short, unsigned short, ble_gatt_access_ctxt*, void*) at [project]/components/esp-nimble-cpp-1.4.1/src/NimBLECharacteristic.cpp:277 (discriminator 1)

0x42036f54: ble_gatts_val_access at [idf path]/esp-idf/components/bt/host/nimble/nimble/nimble/host/src/ble_gatts.c:375

0x42036fd6: ble_gatts_chr_val_access at [idf path]/esp-idf/components/bt/host/nimble/nimble/nimble/host/src/ble_gatts.c:421

0x4203a69a: ble_att_svr_read at [idf path]/esp-idf/components/bt/host/nimble/nimble/nimble/host/src/ble_att_svr.c:398

0x4203b18a: ble_att_svr_read_handle at [idf path]/esp-idf/components/bt/host/nimble/nimble/nimble/host/src/ble_att_svr.c:473

0x42031fb3: ble_gattc_notify_custom at [idf path]/esp-idf/components/bt/host/nimble/nimble/nimble/host/src/ble_gattc.c:4169

0x42025224: NimBLECharacteristic::notify(unsigned char const*, unsigned int, bool) at [project]/components/esp-nimble-cpp-1.4.1/src/NimBLECharacteristic.cpp:513

0x4202523d: NimBLECharacteristic::notify(bool) at [project]/components/esp-nimble-cpp-1.4.1/src/NimBLECharacteristic.cpp:420

[my function calls inside loop()]

0x420069f9: loopTask(void*) at [idf path]/esp-idf/components/arduino/cores/esp32/main.cpp:50

0x40382c6d: vPortTaskWrapper at [idf path]/esp-idf/components/freertos/port/xtensa/port.c:131`

The crash happens with 2 different apps on the phone (LightBlue and the app I'm developing) so it seems to be something on the ESP side. Does anyone know how to fix this, or at least what's causing it? (A bit of poking around makes it seem like the ESP can't find the connection info for the phone, but I have no idea what would cause that.)

@ArdenKolodner
Copy link
Author

ArdenKolodner commented Jan 6, 2023

Update: the problem seems to be occurring somewhere in ble_gap_conn_find in ble_gap.c. At first, the connection works fine, for a second or so. During this time, every call to ble_gap_conn_find is looking for handle 1, which is the first one it finds. But then eventually, it gets a call to find handle 65535 (0xffff), in a list from 1 to 7 (I have 7 connections active), which fails and returns null, causing the error. This seems like the handle's gotten corrupted or otherwise deleted. Happening in a call to ble_att_svr_read_handle, which can replace the conn_handle with BLE_HS_CONN_HANDLE_NONE = 0xffff, under certain circumstances. Continuing to poke around.

Update 2: Although several notifications successfully take place and several packets of data are transmitted before something goes wrong, the error happens on the first Gap event after subscription, which is a Read event (BLE_GATT_ACCESS_OP_READ_CHR). Seems like this event doesn't fire for the first few packets at all? I don't really know how Gap events work, so I can't shed much light there.

@ArdenKolodner
Copy link
Author

Update 3: Here's a full stack trace immediately before the call to ble_gap_conn_find that returns 0 (causing the assert fail):

`Backtrace: [...]
0x42025347: NimBLECharacteristic::handleGapEvent(unsigned short, unsigned short, ble_gatt_access_ctxt*, void*) at [project]/components/esp-nimble-cpp-1.4.1/src/NimBLECharacteristic.cpp:276

0x42037114: ble_gatts_val_access at [idf path]/esp-idf/components/bt/host/nimble/nimble/nimble/host/src/ble_gatts.c:375

0x42037196: ble_gatts_chr_val_access at [idf path]/esp-idf/components/bt/host/nimble/nimble/nimble/host/src/ble_gatts.c:421

0x4203a85a: ble_att_svr_read at [idf path]/esp-idf/components/bt/host/nimble/nimble/nimble/host/src/ble_att_svr.c:398

0x4203b34a: ble_att_svr_read_handle at [idf path]/esp-idf/components/bt/host/nimble/nimble/nimble/host/src/ble_att_svr.c:473

0x42032173: ble_gattc_notify_custom at [idf path]/esp-idf/components/bt/host/nimble/nimble/nimble/host/src/ble_gattc.c:4169

0x42025234: NimBLECharacteristic::notify(unsigned char const*, unsigned int, bool) at [project]/components/esp-nimble-cpp-1.4.1/src/NimBLECharacteristic.cpp:512

0x4202524d: NimBLECharacteristic::notify(bool) at [project]/components/esp-nimble-cpp-1.4.1/src/NimBLECharacteristic.cpp:419

[my function calls]

0x42006a11: loopTask(void*) at [idf path]/esp-idf/components/arduino/cores/esp32/main.cpp:50

0x40382c6d: vPortTaskWrapper at [idf path]/esp-idf/components/freertos/port/xtensa/port.c:131`

@ArdenKolodner
Copy link
Author

ArdenKolodner commented Jan 6, 2023

Update 4: in ble_gattc_notify_custom, argument txom is non-null until the very last call (the one that causes an error), where it's null. It never seems to be null when I use the MacBook. That's potentially the problem! I'll see if I can figure out what's making it become null eventually.

Update 5: ble_hs_mbuf_from_flat returns NULL when os_mbuf_copyinto returns error code 1 from os_mbuf_append, which is error OS_ENOMEM. It's running out of memory, but why would this happen only with a certain client device?

Update 6: tried increasing MSYS_1 block count in menuconfig from 12 to 120, didn't fix anything. Neither did increasing the HCI buffer counts or ACL buffer count.

@ArdenKolodner
Copy link
Author

Finally found something that makes it work: in ble_gattc.c, line 4169, replacing "BLE_HS_CONN_HANDLE_NONE" with "conn_handle" makes it work (essentially, instead of using the "no conn handle" marker, it just uses the conn handle that it already has. I have no idea why that's fixing things, and I would really like to find a solution that doesn't require each separate computer we use for development to make this edit (to a very obscure file) independently. If anyone knows how to fix this, please let me know!

@h2zero
Copy link
Owner

h2zero commented Jan 7, 2023

Hello, good work digging into this. Seems to be an odd issue for sure, what version of IDF are you using?

@ArdenKolodner
Copy link
Author

Thank you! I'm on ESP-IDF 4.4.3 (it says "v4.4.3-dirty" when I run idf.py --version, not sure if that matters). The version of this repo is 1.4.1 and Arduino is ESP Arduino component version 2.0.5.

@ArdenKolodner
Copy link
Author

@h2zero Any update on this?

@h2zero
Copy link
Owner

h2zero commented Jan 19, 2023

Sorry I haven't had time to dig into this much, but as you show in your investigation above there seems to be an issue in the BLE stack somewhere, not in this library. One thing that should probably be done is remove the assert in the BLE_GATT_ACCESS_OP_READ_CHR event handler and gracefully return instead.

@ArdenKolodner
Copy link
Author

Okay, I've created an issue with Espressif, hopefully they can fix the issue. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants