Skip to content

Commit

Permalink
Fix esp32 select race conditions. (#780)
Browse files Browse the repository at this point in the history
- Fixes race condition that caused ESP32's select to not wake up even though an executable was posted.
- Fixes another race condition where variables are not locked when accessing from ISR (relevant on ESP32 only).
- Optimizes an unnecessary round of select wakeup. 

Note: review is in #774 which was irreversibly closed due to an incorrect sequence of operations I did on github.

===

* Fixes race condition in ESP32 select-wakeup implementation.

A correct implementation of a selectable fd driver has to check in vfs_select_start()
whether the fd is readable. This check was missing from the prior implementation.
We had an application level check of the queue being non-empty, but there is a
window of time between the application doing this check, and the select()
implementation of the esp32 getting to calling vfs_select_start(). The wakeup
implementation was only effective if it came after vfs_select_start, by the design
of esp's select mechanism, since the wakeup semaphore only comes in vfs_select_start.

Since the esp32's select() is very slow, this was actually a pretty big gap.

The OpenMRN Device::select implementation does not suffer from this race
condition, because the event group bits can be set at any time, even if
Device::select is still in the setup phase. Added a comment to this effect.

* Fixes another (smaller) race condition in ESP32's select wakeup.

The wakeup_from_isr routine consults the pendingWakeup_ and inSelect_ variables.
These variables need to be locked, because multi-core ESP32's could run an isr
on one core and othercode on a different core.

Moves the atomic lock from esp_wakeup_from_isr into OSSelectWakeup::wakeup_from_isr.

* Optimizes unnecessary select iterations.

When the application already knows about the executables in the queue, we don't
need select to terminate with EINTR. We either ran the executable and the queue
is empty (in which case we want select to sleep), or we know the queue is not
empty and thus will run select with a timeout of 0.
  • Loading branch information
balazsracz committed Feb 5, 2024
1 parent b946516 commit 157139e
Show file tree
Hide file tree
Showing 3 changed files with 25 additions and 2 deletions.
8 changes: 7 additions & 1 deletion src/executor/Executor.cxx
Original file line number Diff line number Diff line change
Expand Up @@ -386,7 +386,13 @@ void ExecutorBase::wait_with_select(long long wait_length)
fd_set fd_r(selectRead_);
fd_set fd_w(selectWrite_);
fd_set fd_x(selectExcept_);
if (!empty()) {
// We will check the queue for any prior wakeups after this call. If we
// already processed the executables, the wakeup is not necessary. Without
// this clear, there would always be two select() iterations happening when
// we are done with work and can go to sleep.
selectHelper_.clear_wakeup();
if (!empty())
{
wait_length = 0;
}
long long max_sleep = MSEC_TO_NSEC(config_executor_max_sleep_msec());
Expand Down
15 changes: 14 additions & 1 deletion src/os/OSSelectWakeup.cxx
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,11 @@ int OSSelectWakeup::select(int nfds, fd_set *readfds,
else
{
#if OPENMRN_FEATURE_DEVICE_SELECT
// It is important that we do this select_clear() in the same
// critical section as checking pendingWakeup above. This ensures
// tht all wakeups before this clear cause sleep to be zero, and
// all wakeups after this clear will cause the event group bit to
// be set.
Device::select_clear();
#endif
}
Expand Down Expand Up @@ -189,6 +194,15 @@ void OSSelectWakeup::esp_start_select(fd_set *readfds, fd_set *writefds,
exceptFds_ = exceptfds;
exceptFdsOrig_ = *exceptfds;
FD_ZERO(exceptFds_);
if (pendingWakeup_)
{
// There is a race condition between the Executor deciding to run
// select, and the internal implementation of select() calling this
// function. Since we only get the semaphone in this call, the wakeup
// functions are noops if they hit during this window. If there was a
// missed wakeup, we repeat it.
esp_wakeup();
}
}

/// This function marks the stored semaphore as invalid which indicates no
Expand Down Expand Up @@ -226,7 +240,6 @@ void OSSelectWakeup::esp_wakeup()
/// call from within an ISR context.
void OSSelectWakeup::esp_wakeup_from_isr()
{
AtomicHolder h(this);
BaseType_t woken = pdFALSE;

// If our VFS FD is not set in the except fd_set we can exit early.
Expand Down
4 changes: 4 additions & 0 deletions src/os/OSSelectWakeup.hxx
Original file line number Diff line number Diff line change
Expand Up @@ -159,6 +159,10 @@ public:
#if OPENMRN_FEATURE_RTOS_FROM_ISR
void wakeup_from_isr()
{
#if defined(ESP_PLATFORM)
// On multi-core ESP32s we need to lock objects even in ISRs.
AtomicHolder h(this);
#endif
pendingWakeup_ = true;
if (inSelect_)
{
Expand Down

0 comments on commit 157139e

Please sign in to comment.