Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sleep vs. yield in CrazyradioThread #37

Open
whoenig opened this issue Aug 2, 2023 · 9 comments
Open

Sleep vs. yield in CrazyradioThread #37

whoenig opened this issue Aug 2, 2023 · 9 comments

Comments

@whoenig
Copy link
Contributor

whoenig commented Aug 2, 2023

We had a number of instances where it seems much better to use a short sleep rather than yield in https://github.com/bitcraze/crazyflie-link-cpp/blob/master/src/CrazyradioThread.cpp#L131-L132

I couldn't find an issue on the C++ side, i.e., the yield does not seem to cause a deadlock. However, it will issue USB requests much more rapidly.

@ataffanel For both Crazyradio PA and Crazyradio2 firmwares, is there a potential race condition between the vendor requests (to configure the radio with a different address etc.) and the send/receive endpoints?

In other words, if I send a sequence of:
libusb_control_transfer (e.g. setAddress 1)
libusb_bulk_transfer (e.g., sendPacket)
libusb_control_transfer (e.g. setAddress 2)

is it possible that the bulk_transfer uses the configuration from the second libusb_control_transfer rather than the first?

@williamleong
Copy link

Hi @whoenig, regarding the yield not causing a deadlock, may I ask about the specifications of your computer, especially the number of cores/logical processors? In my case, I am using the i7-11850H with 8 cores and 16 logical processors, hence it is unlikely that yield will actually block or cause a context switch.

I recommend against using yield as it is not deterministic. My prior experience with yield also results in deadlocks with a similar scenario with mutexes and a producer/consumer queue and I ended up using a lock-free queue for better results.

@whoenig
Copy link
Contributor Author

whoenig commented Aug 2, 2023

I used a Laptop with i7-1165G7 (4 physical cores). I did have some logging enabled that showed me that there was no deadlock in this case, that's why I suspect that there might be a race condition directly on the hardware side w.r.t. the USB protocol.

@williamleong
Copy link

Thanks, looks like there's indeed a problem at the lower level and even if there was no deadlock we could still face the same issue.

@williamleong
Copy link

Did some logging as well and can verify that the threads are not deadlocking on my side also.

@ataffanel
Copy link
Member

For both Crazyradio PA and Crazyradio2 firmwares, is there a potential race condition between the vendor requests (to configure the radio with a different address etc.) and the send/receive endpoints?

In other words, if I send a sequence of:
libusb_control_transfer (e.g. setAddress 1)
libusb_bulk_transfer (e.g., sendPacket)
libusb_control_transfer (e.g. setAddress 2)

is it possible that the bulk_transfer uses the configuration from the second libusb_control_transfer rather than the first?

It is only possibleif the control transfer is initiated before getting an answer from the radio (in case of a unicast packet), so if the software waits for the ack to come back from the radio before setting a new address, there should not be any race. Broadcast is much more problematic in that case though since there is no way for the software to know that the packet has actually been handled.

I am looking at modifying the firmware to order things following the USB time or arrival: to make sure transfers are handled once at a time and in order.

@ataffanel
Copy link
Member

I added a mutex so that this race condition should not happen anymore: setting the address will block on the PC side until the previous packet has been actually sent.

I attach the new firmware built by github CI: crazyradio2-CRPA-emulation.zip. Please tell me if you see any change with it, if so I can make a new release of the firmware (the uf2 file should be drag-and-dropped in the bootloader. Connect the Crazyradio 2.0 with the button pressed to launch the bootloader).

@whoenig
Copy link
Contributor Author

whoenig commented Aug 24, 2023

This magically resolves all those timeout issues/frozen Crazyradio2 mentioned in #35, i.e., one can use yield() rather than sleep() in the CrazyradioThread. I also reviewed the code and it looks good to me. I didn't do any flight tests, though.

Do you know if we have a similar issue in Crazyradio PA? Because we still see strange behavior of our drones sometimes that I believe might be caused by communication bugs.

@ataffanel
Copy link
Member

Great that this fixes the problem, I will make a new release of the firmware so that this is what people will run by default from now-on.

As for Crazyradio PA, I just looked and I do not think a similar bug is possible: setup request are handled in the main loop after handling packet transfer and broadcast packet seems to wait for the packet to be sent before letting the loop continue. So things should be handled in sequence.

@whoenig
Copy link
Contributor Author

whoenig commented Aug 28, 2023

Awesome thanks!

For CR2: could you also increment the version number, so that we have an easy way of checking that this is fixed? Otherwise I fear that we'll be spammed with issues regarding LIBUSB errors.

CRPA: Thanks for looking into it. My guess is that the issues we are seeing are related to bitcraze/crazyflie-firmware#703, i.e. we do have still packets that have been ack'ed, but are not being processed in the firmware.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants