Replies: 1 comment 1 reply
-
To follow up on my own question, I've found a lot of further information in the MicroPython docs that I hadn't fully seen on my first pass. A combination of Is there any way to convince the ESP32 network stack to pre-allocate the 1536 bytes it seems to need periodically? It would be great to have it allocate a little bit more RAM so that it has the resources it needs so that the deadlock situation can be avoided. |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi MicroPython fans,
Firstly, I'm completely new to MicroPython (but not python), and semi-new to programming on small RAM-limited boards like the ESP32, so it's entirely possible there's something obvious I'm doing wrong :) I've searched the old forum plus this discussion area and not turned up anything similar, closest I've seen are some network issues where the ESP32 role is the other way round with it acting as the TCP server rather client, but then then those don't appear to address this issue. Apologies if this has been covered previously and I didn't recognise it.
I've recently embarked on a project to move some scripts running on a Raspberry PI at home to an ESP32. They connect to a couple of bluetooth devices, and send values received to an MQTT server. I figured the ESP32 might be perfect for this, and seemed like the perfect opportunity to give MicroPython a go instead of writing in C/C++. It's highly likely I've a lot more to learn around how best to tame memory management carefully on MicroPython for these small boards.
Long story short, what I'm finding is, if I write too many messages to the MQTT server (basically packets to the network TCP connection) when the ESP32 has low RAM available, typically around 4KB reported by
gc.mem_free()
in my reproducible test, then the network stack doesn't successfully receive the ACK packet in response to the data being sent, and blocks the sync/asyncio task writing to the stream. In my larger application the issue seems to kick in with around 10KB reported bygc.mem_free()
, the difference probably related to fragmentation in the heap. Whilst the ACK is ignored by ESP32, the network stack of the ESP32 continues the usual re-send of the last packet it hasn't received an acknowledgement to yet, but never sees the ACK that's then resent, so continues to be blocked.I'm guessing it's something to do with the network stack needing to allocate some block of memory which it can't in order to receive the response. When it's in this state, it also no longer responds to ARP requests, so the server side eventually looses the MAC address of the ESP32, so doesn't know where to respond with the ACK. To test this further, with a rather crude fudge in my test scenario, freeing up some memory does seem to allow the network stack to jump back into life, albeit with a lost packet in my test, but that packet loss could be the result of the exception I needed to cause to unblock the stream write so I could then free up some RAM, so the TCP connection probably became out of-sync, but the connection did then start processing more packets to issue a connection reset, so seems to show that it's probably related to a lack of heap space.
Bottom line currently, I can't see any easy way to spot when this situation occurs, which makes it difficult to take some action. If fact, I've observed in my larger application that many more bytes of data are successfully written to the stream in python, but never actually leave the ESP32 network stack, presumably due to it still waiting to receive the ACK from an earlier packet, and possibly related to heap allocation/fragmentation somehow allowing more writes to be buffered. I've also not managed to see any error messages from the ESP32 about the lack of heap space within the network stack when it's failing to receive the ACK. Is there some suggestions on how to debug this further?
This behaviour feels like it's related to the natural network back-off process to ensure downstream resources aren't overwhelm. However, I would have hoped the network stack would arrange to ensure writing a packet out to the network could successfully receive a response, even It if then blocks further writes. Seems like it causes itself a deadlock. Is there something I'm missing to easily detect or avoid up-front this situation occurring? How can I ensure there's enough RAM available for the network stack to successfully receive responses to the packets it's sending? Like I said above, it's highly likely I'm just missing some standard approach around reliably ensuring this situation can never occur. Any recommendations?
In my scenario, as everything is flowing towards the MQTT server, I would need to somehow detect this is occurring and then backoff if there's too many messages to send. I could restrict the number of readings in flight at any point in time, the tricky part is knowing how much free space (presumably relating to the largest allocatable heap space) I'm targeting to allow the network stack to work reliably. Currently I've been restricting this on the input side when I see memory errors recording the new values from the sensors, but that's too late by that point. Is there some typical or recommended techniques used for this sort of thing that I might be missing?
In case it's useful, below is my current way of reproducing the issue. It's an overly simplified example purely to replicate the issue. Tested on the latest 1.24.0 released version for ESP32C3:
esptool.py --chip esp32c3 --port /dev/tty.usbmodem14201 erase_flash
curl -L https://micropython.org/resources/firmware/ESP32_GENERIC_C3-20241025-v1.24.0.bin > mp-esp32c3-1.24.0.bin
esptool.py --chip esp32c3 --port /dev/tty.usbmodem14201 --baud 460800 write_flash -z 0x0 mp-esp32c3-1.24.0.bin
curl -L https://raw.githubusercontent.com/micropython/micropython-lib/refs/heads/master/micropython/umqtt.simple/umqtt/simple.py > umqtt-simple.py
mpremote fs mkdir :lib
mpremote fs mkdir :lib/umqtt
mpremote fs cp umqtt-simple.py :lib/umqtt/simple.py
Create the following script in
test_net.py
:Run:
mpremote run test_net.py
gives the following output:where it then blocks/hangs after sending some number of messages to the mqtt topic, just 1 in the above example.
Monitoring the network on the mqtt server interface shows:
where you can see
ack 61
being acknowledged but it looks like that never makes it back through the ESP32 network stack. You can also see the ARP requests being sent to the ESP32 and the ESP not sending responses.Any help or direction on what I should do/try next would be very helpful. Is this intended behaviour? Am I doing something stupid? Is this a bug? Is there some tweak I should make to the MicroPython build to enable something?
Many thanks in advance for any help, direction, or pointers anyone can give.
Pete
Beta Was this translation helpful? Give feedback.
All reactions