After 30 hours - crash where to start #10146
Replies: 2 comments 4 replies
-
Whether or not this is of much help ... I ended up going to the PSRAM variant for two dataloggers that kept crashing on ESP32-WROOM REV1 chips. It was by using try/except clauses everywhere, a software watchdog and saving the errors to a log file that I found out where the problems were. In the software watchdog I would record the time of the fault to that logfile, so I could identify what actually went wrong. Good luck! |
Beta Was this translation helpful? Give feedback.
-
Unfortunately I have no experience of Bluetooth but I have some of radio communication. Getting radio links working is easy; getting them working reliably is less so. I would start out by trying to prove whether the BT code is the problem. Run one or more devices long term with the BT code simulated. While that test is ongoing, try a unit with BT enabled and see if you can provoke the failure. Possible causes are interference from other equipment, weak signal levels or both. Lacking specialist RF testgear, weak signals can be achieved by increasing the distance between the device under test (DUT) and the machine being scanned. In extremis put the DUT in a microwave (an excellent Faraday cage) and gradually shut the door. Electrical interference could be generated from another ESP using WiFi heavily, a radio running in the same band such as NRF24L01, or some nasty random source like electrical sparks (take care how you create these). If you can find a way to provoke failure, identifying the cause in code should be quicker. You might also be able to reduce the problem down to a simple test case which could be discussed. Of course you may find that the units with simulated BT do fail, in which case you have a conventional (difficult) debugging task. Only in that case would I worry about issues like RAM fragmentation. Memory failures do not normally result in a crash: I'd expect an exception to be thrown. |
Beta Was this translation helpful? Give feedback.
-
After 1,800 loops of 60ea =30hr (this almost went under the wire)
or 1,800 loops @ 3.4 sec = ~1hr 42min
the esp32 crashes.
INFO:
MicroPython v1.17-805-g7b1d10d69-dirty on 2022-04-05;
4MB/OTA/SPIram module with ESP32
LVGL v8.1.0-dev
Partition.find(Partition.TYPE_APP)
Partition type=0, subtype=16, address=65536, size=1835008, label=ota_0, encrypted=0
Partition type=0, subtype=17, address=1900544, size=1835008, label=ota_1, encrypted=0
Partition.find(Partition.TYPE_DATA)
Partition type=1, subtype=2, address=36864, size=16384, label=nvs, encrypted=0
Partition type=1, subtype=0, address=53248, size=8192, label=otadata, encrypted=0
Partition type=1, subtype=1, address=61440, size=4096, label=phy_init, encrypted=0
Partition type=1, subtype=129, address=3735552, size=264448, label=vfs, encrypted=0
Memory hardly moves after each loop.
mem_alloc: 99056
mem_free: 3999184
The errors are different , but the below error came up more offten...
esp-idf/components/freertos/queue.c:705 (xQueueCreateCountingSemaphore)- assert failed!
The frist thing to be said is.... you memory is fragmented.
I have gc.collect() after every function I call and more. 62 in a 2200 line script.
The loop is a very simple scan for BLE devices,
return a buffered list of only two items,
and plug the values into pre allocated dictionaries.
Hardly any strings created.
Is there a way to monitor this "fragmentatiion".
How does one go about finding the culprit ???
My feeling is the Bluetooth scan is the problem as this was a problem in the past.
I see posts re: requests and bluetooth having issues. I use request but not during the 1800 loop.
I also send websockets to a webpage hosted from the esp32 on every loop.
Also where is the heap located in the partition table?
This is a large project that is halted and any H$LP would be appreciated.
Beta Was this translation helpful? Give feedback.
All reactions