-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add SETUP_EP0_OUT_BUF(), to be used for OUT control transfers. #18
base: main
Are you sure you want to change the base?
Conversation
When processing OUT control transfers, the EP0 buffer is armed when wriging EP0BCL, while SDPAUTO is high. At this stage it's not necessary to set HSNAK, because that bit has an effect on the status stage, not on the data stage. Beyond it being not necessary to write HSNAK early, it introduces a race condition, because the host sees the status stage complete early, before we have fully processed the content of EP0BUF, and as such it thinks it can send more control transfers. If the host actually sends more control transfers, then EP0BUF might be overwritten before we have completed processing EP0BUF, and this can cause data corruption. The clean way to handle this, is to force the host to wait in the status stage until we have fully processed the EP0BUF, by NAK-ink the status stage. This does not mean that the packets in the Data stage will be NAK'd because, that is controlled by whether the buffer is armed or not. Firmware-wise the new recommended way to the OUT control transfers is: ```c for (each_expected_packet) { SETUP_EP0_OUT_BUF(); handle_packet(EP0BUF, EP0BCL); // EP0BCL will have the size of the received packet } ACK_EP0(); ``` That is: only call `ACK_EP0()`, which clears HSNAK by writing 1 to it, after we know it's safe to overwrite EP0BUF. Note: a simple way to reproduce the issue when not following this procedure, is to run: ```bash while true; do lsusb -v -d $(DEVICE_VID):$(DEVICE_PID) > /dev/null; done ``` In a terminal. (replacing DEVICE_VID/DEVICE_PID with correct values) While this loop is running, we expected most applications doing control out transfers with the old method to become unusable. Technically speaking a single run of `lsusb` could even cause trouble if it happes at the wrong time, and I think this could be triggered in other situations as well, this is just the easiest way to reproduce the problem.
MORE PRs to follow to fix this for glasgow, and some examples in libfx2 |
Thanks so much for handling this, I'm really happy that you are looking into it! Right now I'm feeling too sick to review the code but I'll do it as soon as I can. I've known about what I think is this exact issue for a while but never found the time to handle it. |
Before this change, the EP0BUF vuffer used by control out transfers could be overwritten by new control transfers before the firmware is finished processing the previous control transfer. The easiest way to illustrate this problem would be to run in a different terminal the following: ```bash while true; do lsusb -v -d 20b7:9db1 > /dev/null; done ``` While this is running, glasglow is completely unusable. Presumably even a single lsusb run could cause corruption, if it happens to be issued at the wrong time. With this change glasgow is now usable, even if the above loop is running. Please see whitequark/libfx2#18 for more details.
Before this change, the EP0BUF buffer used by control out transfers could be overwritten by new control transfers before the firmware is finished processing the previous control transfer. The easiest way to illustrate this problem would be to run in a different terminal the following: ```bash while true; do lsusb -v -d 20b7:9db1 > /dev/null; done ``` While this is running, glasglow is completely unusable. Presumably even a single lsusb run could cause corruption, if it happens to be issued at the wrong time. With this change glasgow is now usable, even if the above loop is running. Please see whitequark/libfx2#18 for more details.
I hope you will be better soon! |
I'm disabled; it does not really get better. But I may be able to work out some time for my OSS projects, since I want to work on them still. |
Before this change, the EP0BUF buffer used by control out transfers could be overwritten by new control transfers before the firmware is finished processing the previous control transfer. The easiest way to illustrate this problem would be to run in a different terminal the following: ```bash while true; do lsusb -v -d 20b7:9db1 > /dev/null; done ``` While this is running, glasglow is completely unusable. Presumably even a single lsusb run could cause corruption, if it happens to be issued at the wrong time. With this change glasgow is now usable, even if the above loop is running. Please see whitequark/libfx2#18 for more details.
Thank you, the analysis and the fix look all correct to me. Could you please add a pair of macros For deprecation, I propose making it rely on a data symbol called something like |
Also, I would strongly prefer to upgrade all of the examples as well, since in practice half of the people who use the library at all will just copy&paste example code. |
When processing OUT control transfers, the EP0 buffer is armed when wriging EP0BCL, while SDPAUTO is high. At this stage it's not necessary to set HSNAK, because that bit has an effect on the status stage, not on the data stage.
Beyond it being not necessary to write HSNAK early, it introduces a race condition, because the host sees the status stage complete early, before we have fully processed the content of EP0BUF, and as such it thinks it can send more control transfers. If the host actually sends more control transfers, then EP0BUF might be overwritten before we have completed processing EP0BUF, and this can cause data corruption.
The clean way to handle this, is to force the host to wait in the status stage until we have fully processed the EP0BUF, by NAK-ink the status stage. This does not mean that the packets in the Data stage will be NAK'd because, that is controlled by whether the buffer is armed or not.
Firmware-wise the new recommended way to the OUT control transfers is:
That is: only call
ACK_EP0()
, which clears HSNAK by writing 1 to it, after we know it's safe to overwrite EP0BUF.Note: a simple way to reproduce the issue when not following this procedure, is to run:
In a terminal. (replacing DEVICE_VID/DEVICE_PID with correct values) While this loop is running, we expected most applications doing control out transfers with the old method to become unusable.
Technically speaking a single run of
lsusb
could even cause trouble if it happes at the wrong time, and I think this could be triggered in other situations as well, this is just the easiest way to reproduce the problem.