Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix esp32 select race conditions. #774

Closed

Conversation

balazsracz
Copy link
Collaborator

@balazsracz balazsracz commented Feb 3, 2024

  • Fixes race condition that caused ESP32's select to not wake up even though an executable was posted.
  • Fixes another race condition where variables are not locked when accessing from ISR (relevant on ESP32 only).
  • Optimizes an unnecessary round of select wakeup.

===

  • Fixes race condition in ESP32 select-wakeup implementation.

A correct implementation of a selectable fd driver has to check in vfs_select_start()
whether the fd is readable. This check was missing from the prior implementation.
We had an application level check of the queue being non-empty, but there is a
window of time between the application doing this check, and the select()
implementation of the esp32 getting to calling vfs_select_start(). The wakeup
implementation was only effective if it came after vfs_select_start, by the design
of esp's select mechanism, since the wakeup semaphore only comes in vfs_select_start.

Since the esp32's select() is very slow, this was actually a pretty big gap.

The OpenMRN Device::select implementation does not suffer from this race
condition, because the event group bits can be set at any time, even if
Device::select is still in the setup phase. Added a comment to this effect.

  • Fixes another (smaller) race condition in ESP32's select wakeup.

The wakeup_from_isr routine consults the pendingWakeup_ and inSelect_ variables.
These variables need to be locked, because multi-core ESP32's could run an isr
on one core and othercode on a different core.

Moves the atomic lock from esp_wakeup_from_isr into OSSelectWakeup::wakeup_from_isr.

  • Optimizes unnecessary select iterations.

When the application already knows about the executables in the queue, we don't
need select to terminate with EINTR. We either ran the executable and the queue
is empty (in which case we want select to sleep), or we know the queue is not
empty and thus will run select with a timeout of 0.

This will be usable for printing a summary of the stats for a developer printout like a log statement.
A correct implementation of a selectable fd driver has to check in vfs_select_start()
whether the fd is readable. This check was missing from the prior implementation.
We had an application level check of the queue being non-empty, but there is a
window of time between the application doing this check, and the select()
implementation of the esp32 getting to calling vfs_select_start(). The wakeup
implementation was only effective if it came after vfs_select_start, by the design
of esp's select mechanism, since the wakeup semaphore only comes in vfs_select_start.

Since the esp32's select() is very slow, this was actually a pretty big gap.

The OpenMRN Device::select implementation does not suffer from this race
condition, because the event group bits can be set at any time, even if
Device::select is still in the setup phase. Added a comment to this effect.
The wakeup_from_isr routine consults the pendingWakeup_ and inSelect_ variables.
These variables need to be locked, because multi-core ESP32's could run an isr
on one core and othercode on a different core.

Moves the atomic lock from esp_wakeup_from_isr into OSSelectWakeup::wakeup_from_isr.
When the application already knows about the executables in the queue, we don't
need select to terminate with EINTR. We either ran the executable and the queue
is empty (in which case we want select to sleep), or we know the queue is not
empty and thus will run select with a timeout of 0.
* bracz-tmp-compile-fix-merge:
  Fix compiler warnings in openmrn when using new GCC's. (#772)
  Fix comment.
  Upintegrate changes from the OpenMRNIDF repository (#771)
  Fix comments.
  Adds support for DCC extended accessories  (#769)
  Fix incorrect consumer identified message being emitted by dcc accy producer. (#768)
  Avoids rendering hidden segments. (#767)
  Adds trailing zero to the cdi XML file written to the filesystem. (#777)
  Fix target subdirectory name (#775)
Base automatically changed from bracz-stat-max to bracz-tmp-compile-fix-merge February 5, 2024 03:12
Updates the latency test of hub_test:
- refactors the stats printing code into the Stats class
- adds max statistic to the class
- adds the latency consumer object that can be included in a product to verify the application level latency.

===

* Adds a debug print function to the stats object.
This will be usable for printing a summary of the stats for a developer printout like a log statement.

* Adds maximum to the statistics object.

* Adds a consumer object for event based testing.

* Adds hook to the latency test consumer. This allows testing latency
of arbitrary internal node processing.

* Adds documentation comment to latency test consumer.

* Fix comment.

* Fixes data type of max.
* master:
  Latency test with maximum stats and custom process evaluation (#773)
* bracz-stat-max:
  Latency test with maximum stats and custom process evaluation (#773)
  Fixes data type of max.
  Fix comment.
  Adds documentation comment to latency test consumer.
  Fix compiler warnings in openmrn when using new GCC's. (#772)
  Fix comment.
  Upintegrate changes from the OpenMRNIDF repository (#771)
  Fix comments.
  Adds support for DCC extended accessories  (#769)
  Fix incorrect consumer identified message being emitted by dcc accy producer. (#768)
  Avoids rendering hidden segments. (#767)
  Adds trailing zero to the cdi XML file written to the filesystem. (#777)
  Fix target subdirectory name (#775)
@balazsracz balazsracz deleted the branch bracz-tmp-compile-fix-merge February 5, 2024 03:18
@balazsracz balazsracz closed this Feb 5, 2024
balazsracz added a commit that referenced this pull request Feb 5, 2024
- Fixes race condition that caused ESP32's select to not wake up even though an executable was posted.
- Fixes another race condition where variables are not locked when accessing from ISR (relevant on ESP32 only).
- Optimizes an unnecessary round of select wakeup. 

Note: review is in #774 which was irreversibly closed due to an incorrect sequence of operations I did on github.

===

* Fixes race condition in ESP32 select-wakeup implementation.

A correct implementation of a selectable fd driver has to check in vfs_select_start()
whether the fd is readable. This check was missing from the prior implementation.
We had an application level check of the queue being non-empty, but there is a
window of time between the application doing this check, and the select()
implementation of the esp32 getting to calling vfs_select_start(). The wakeup
implementation was only effective if it came after vfs_select_start, by the design
of esp's select mechanism, since the wakeup semaphore only comes in vfs_select_start.

Since the esp32's select() is very slow, this was actually a pretty big gap.

The OpenMRN Device::select implementation does not suffer from this race
condition, because the event group bits can be set at any time, even if
Device::select is still in the setup phase. Added a comment to this effect.

* Fixes another (smaller) race condition in ESP32's select wakeup.

The wakeup_from_isr routine consults the pendingWakeup_ and inSelect_ variables.
These variables need to be locked, because multi-core ESP32's could run an isr
on one core and othercode on a different core.

Moves the atomic lock from esp_wakeup_from_isr into OSSelectWakeup::wakeup_from_isr.

* Optimizes unnecessary select iterations.

When the application already knows about the executables in the queue, we don't
need select to terminate with EINTR. We either ran the executable and the queue
is empty (in which case we want select to sleep), or we know the queue is not
empty and thus will run select with a timeout of 0.
@balazsracz balazsracz deleted the bracz-select-race-condition branch February 5, 2024 03:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants