Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

STM32 Controller does not abort receive request after timeout #165

Open
JonTBeckett opened this issue May 24, 2024 · 4 comments
Open

STM32 Controller does not abort receive request after timeout #165

JonTBeckett opened this issue May 24, 2024 · 4 comments
Labels
bug Something isn't working

Comments

@JonTBeckett
Copy link

JonTBeckett commented May 24, 2024

Describe the bug
I see that #160 is bringing the STM32 controllers into this repo. It appears the issue is in that portion primarily. If this portion isn't maintained by USBX, feel free to direct me to the appropriate maintainers.

  • What target device are you using? STM32U585
  • Which version of Eclipse ThreadX? 6.1.10
  • What toolchain and environment? Keil MDK v5.38 with the ARM Clang v6.20.1

Given

  1. A timeout value has been set for read operations via ux_device_class_cdc_acm_ioctl()
  2. At least one call to ux_device_class_cdc_acm_read() has timed out

When the host sends data to the device while the device is NOT pending in a call to ux_device_class_cdc_acm_read(),
Then the data sent by the host is never delivered to the device application even on subsequent calls to ux_device_class_cdc_acm_read()

It appears that the STM32 controller is not aborting the transfer request in the USB peripheral HW, causing an unexpected interrupt. Eventually HAL_PCD_DataOutStageCallback() in the ux_dcd_stm32_callback.c file is called in the interrupt handler. This function sets the size of the incoming data in the appropriate UX_SLAVE_TRANSFER structure and then the appropriate semaphore is “put”. The issue comes from the fact that on the subsequent call to ux_device_class_cdc_acm_read(), the size in the UX_SLAVE_TRANSFER structure is set to 0, clearing any record of data already in the buffer.

Using TraceX, I confirmed that the interrupt happens before the call to ux_device_class_cdc_acm_read() by observing that there were no threads pending on the semaphore when the semaphore is “put”.

Locally, I was able to mitigate this issue in two ways:

  1. Setting the transfer request size to 0 after completing the read instead of before (still setting to 0 before transmit requests).
    1. This obviously doesn’t directly solve the issue as there is no guarantee that the buffer provided to the USB peripheral HW when the receive request was setup is still valid when the interrupt occurs.
  2. Checking the return value of the semaphore “get” operation in _ux_dcd_stm32_transfer_request() (found in ux_dcd_stm32_transfer_request.c) and calling _ux_dcd_stm32_transfer_abort() on appropriate errors (namely when TX_NO_INSTANCE is returned.
    1. My quick patch implementation of this still has a race condition between the interrupt and the aborting of the transfer request though so it is just a proof of concept at this point.

To Reproduce
I don't have a demo project to share but I believe that any project that uses the CDC ACM Device class on STM32 will demonstrate the issue.
The following steps provide a good starting point for forcing this issue to occur.

  1. DEVICE: Set the timeout value for reads to a low value (100ms is what I used) via ux_device_class_cdc_acm_ioctl().
  2. DEVICE: Perform at least one read via ux_device_class_cdc_acm_read() that times out BEFORE sending any data from the host to the device.
  3. DEVICE: Wait a relatively long time before performing another read (I used 15 seconds for this time period to force the issue)
  4. HOST: Send any data to the device (ensuring it happens after the first timeout and during the length of time described in step 3)
  5. Note that the device application never receives the data even on subsequent calls to ux_device_class_cdc_acm_read()

Expected behavior
The USB peripheral HW transfer request should be aborted so that the interrupt does not occur outside of reads.

Impact
This lowers my team's confidence in USBX. It is not a show stopper as the odds of it happening during normal operations are fairly low as long as the processing time between calls to ux_device_class_cdc_acm_read() is small. We will be doing further stress testing to see how often we hit it in general operating conditions and may have to move away from USBX if it occurs frequently enough.

@JonTBeckett JonTBeckett added the bug Something isn't working label May 24, 2024
@JonTBeckett
Copy link
Author

I see that the related PR has been closed without merging because the files will be delivered in a dedicated repo. Is there any info on which repo and where that will be?

@cosmikwolf
Copy link

cosmikwolf commented Sep 30, 2024

I am also hitting this problem, implementing a simple HID device. HID signals will always stop sending at some point, with tx_semaphore_get() returning TX_NO_INSTANCE

my setup is:

@JonTBeckett I would love to see your patch, I have not yet figured out how to implement a fix in my scenario. I could not find a related PR to this issue when I searched.

@cosmikwolf
Copy link

Also @JonTBeckett I think that this could may be maintained by ST, and they have this repo which seems to have the latest version of the _ux_dcd_stm32 drivers: https://github.com/STMicroelectronics/stm32_mw_usbx.git

@JonTBeckett
Copy link
Author

@cosmikwolf Thanks for finding the ST repo! I'll open this same bug under there when I get a chance.

As for the patch, I'll have to dig it up. It's been a while since I tinked with and the patch is buried in a stash in one of my local repos. Hopefully I can find it again and attach it for you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants