-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Maybe bug fix: TASKS_START might be invoked head of time #9
base: master
Are you sure you want to change the base?
Conversation
When I started the development of this firmware, it was configured to send every packets received, and it worked pretty well. Maybe this issue resides in the UART management code, combined with your modification in btlejack (using select() instead of burning the CPU) ? Can this issue be caused by a UART buffer overwritten while RADIO_IRQHandler() is called ? |
The following commit exhibits the bug well: nviennot@f9368d1 When I run with btlejack with the modified firmware to show the bug, from the master branch (no select code) here is what I get:
We should only see the |
Yes, definitely looks like rx_buffer is overwritten. I will have a deeper look at this issue, thanks ! |
A quick update on this one. I had some similar issues when I was trying to implement a new feature a few days ago, and malformed packets were sent as CONNECT_REQ packets while they were in fact not CONNECT_REQ packets and this caused the python client to fail on an assertion. I spent some time investigating this issue and you were right: on some occasions TASKS_START is enabled BEFORE sending buffer to UART, and if a packet is received right after the radio has been enabled and put in RX mode and before a call to notifySomething() is made then the RX buffer is overwritten and bad data is sent to the host computer. I fixed a similar bug that appeared when dealing with CONNECT_REQ packets, but I will definitely rewrite the radio module in order to avoid this and investigate the main code to find similar issues as well. It may takes time to fix all of this, but I guess it will improve the stability of the tool. |
I think I found what happens. The data you got (05 00 and 09 00) while testing with a forced RX buffer (all bytes set to 0x11) are empty PDUs. These PDUs are sent by the master to the slave at the beginning of each connection event. I think the timer in charge of hopping from one channel to another collides with the radio IRQ handler. I mean, the timer causes the radio to listen on another channel and it enables the TASKS_START at the end of the sync_hop_channel() function. If a packet is then received, it will overwrite the RX buffer and a call to notifyNordicTapBlePacket() will obviously send this new packet rather than the one that triggered the call to the radio IRQ handler. And in this specific case, this packet would be an empty PDU because it is the first packet sent by the master on the channel. A possible fix could be to try to avoid sending packets to UART when it is too late considering the connection event, or to postpone them to the next connection event. It will be delayed, but then we should have enough time to send two packets instead of one for instance. Disabling the timer while we are sending a packet to UART may lead to packets loss and connection de-synchronization, so postponing is maybe the best option. It may require some time to find the correct way to implement this and to test it thoroughly. But I'm pretty confident that is the reason why you capture some "corrupted" packets, and it may explain some instability I experienced while sniffing connections (connection lost sometimes with weird packets captured). |
Ah! |
I don't have full confidence in this patch. I have not tested all functionalities, but it seems to work pretty well.
Problem:
I was trying to capture all packets including zero length packets which are normally not sent. So I modified the firmware, and I noticed that I was getting a lot of corrupt packets.
An easy way to manifest the corruption bug is to invoke the call to pLink->notifyNordicTapBlePacket() in main.cpp:838 twice. Normally, one should see the same packets twice coming when sniffing new connections with
btlejack -c any -v
. Oddly, we see many packets that are corrupted.In turns out, the radio DMA controller overwrites the rx_buffer as we are copying it in
notifyNordicTapBlePacket()
. This should not happen as we do not trigger aNRF_RADIO->TASKS_START = 1
until after the UART has copied the bytes.So it turned out, after
while (NRF_RADIO->EVENTS_READY == 0);
inradio_set_sniff()
, we setNRF_RADIO->TASKS_START = 1
, without reseting theNRF_RADIO->EVENTS_READY
flag to 0. Eventually, the END event triggers and we land in the interrupt routineRADIO_IRQHandler()
. But because we didn't put theNRF_RADIO->EVENTS_READY
flag to 0 earlier, we test it to 1 and so we setTASKS_START = 1
, which is problematic because now the receiver is allowed to overwrite therx_buffer
while we are using it.Normally, this should only corrupt the first packet. However, I saw endless corrupted packets. So I am not confident that this is the right explanation. And so I'm not confident that this is the right fix.
The IRQ handler is set to only trigger on END events, not READY events. So we can only get into the IRQ handler by having been ready in the first place. I am quite confused. Maybe someone can shed some light on this.