Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix bmalloc hang with RT thread priorities #1408

Open
wants to merge 1 commit into
base: wpe-2.38
Choose a base branch
from

Conversation

filipe-norte-red
Copy link

@filipe-norte-red filipe-norte-red commented Sep 24, 2024

Description:

When real time (RT) thread priorities are used for some of the gstreamer pipeline elements, we may run into a situation where several RT threads start spinning during a mutex acquisition process, leading to a system hang as most other threads won't be able to run.

Sequence of events leading up to the hang:

  1. A web process thread (non-RT priority) acquires the mutex lock for the heap and is then involuntary descheduled, and does not run again
  2. vqueue:src (RT priority) enters the lockSlowCase and starts spinning in the while loop
  3. multiqueue0:src (instance 1, RT priority) enters the lockSlowCase and starts spinning in the while loop
  4. aqueue:src (RT priority) enters the lockSlowCase and starts spinning in the while loop
  5. multiqueue0:src (instance 2, RT priority) enters the lockSlowCase and starts spinning in the while loop

Once stage 5 is hit, the box is hung as the only thing that can run on a CPU core is:

  1. one of the above RT threads (aqueue, vqueue, or multiqueue)
  2. any other RT thread with a priority equal or greater than the above RT threads
  3. any h/w irq

The use of the usleep() will allow the low priority process to run and release the mutex lock, avoiding the hang

Author of issue analysis and fix proposal: Steven Webster

Proposed fix / analysis summary:

The proposed fix is to replace the sched_yield() call with a usleep() call. This will guarantee that the calling thread will deschedule for the specified time period, allowing the low priority thread to run and release the mutex lock, avoiding the hang.

This fix also has the benefit of reducing the cpu usage of the threads that enter tight while() loop in lockSlowCase() and spin waiting for the mutex to be released.

An example of how much cpu runtime can be saved is seen by comparing the kernelshark screenshots. The table below shows the actual thread execution time as a percentage of over runtime:

Description Runtime (us) Execution time (us) # loop iterations % execution time of runtime
sched_yield() 4538 4538 1128 100
usleep(125) 4230 172 27 4.089
usleep(150) 5830 223 27 3.818

The choice of the usleep value is a tradeoff between lower the %execution time of runtime against the usleep time being greater than lockSlowCase() would normally run for.

Two values of usleep were measured:

  1. 125us – the overhead (the addition time over 125us the syscall takes due to setup/latency etc) for this value is 22us or 17.5% of usleep time. The measured % of call to lockSlowCase() where the runtime is < 125us is 26.6%
  2. 150us – the overhead for this value is 21us or 14% of usleep time. The measured % of call to lockSlowCase() where the runtime is < 150us is 28.3%

NOTE: - the % of calls to lockSlowCase() that were less than either of the usleep values, were measured over a 30min period, so the value could move up/down as the measurement period is increased

The recommended usleep time is 150us as this gives a lower overhead ratio, lower execution to runtime ratio for a small increase in the number of times a thread may block longer than it would originally run for.

Reproduction

To reproduce this issue, the easiest way is to patch gstreamer to set RT priority for created threads. The attached patch can be used for this purpose: gstreamer-priority.zip

To enable RT thread priorities, define the environment variable "FJN_T". To disable RT thread priorities, remove "FJN_T" environment variable.

The attached index.zip contains an html page that plays videos in a loop. Serve this file on a web server and open a browser instance on the corresponding url with the above mention env var defined. The issue should be reproducible in 10-30 min.

Internal Reference: LLAMA-15112

@filipe-norte-red filipe-norte-red marked this pull request as ready for review September 24, 2024 11:17
When real time (RT) thread priorities are used for some of the gstreamer
pipeline elements, we may run into a situation where several RT threads
start spinning during a mutex acquisition process, leading to a system
hang as most other threads won't be able to run.

Sequence of events leading up to the hang:
1. A web process thread acquires the mutex lock for the heap and is
then involuntary descheduled, and does not run again
2. vqueue:src (RT priority) enters the lockSlowCase and starts spinning
in the while loop
3. multiqueue0:src (instance 1, RT priority) enters the lockSlowCase and
starts spinning in the while loop
4. aqueue:src (RT priority) enters the lockSlowCase and starts spinning
in the while loop
5. multiqueue0:src (instance 2, RT priority) enters the lockSlowCase and
starts spinning in the while loop

Once stage 5 is hit, the box is hung as the only thing that can run on a
CPU core is:
1. one of the above RT threads (aqueue, vqueue, or multiqueue)
2. any other RT thread with a priority equal or greater than the above
RT threads
3. any h/w irq

The use of the usleep() will allow the low priority process to run and
release the mutex lock, avoiding the hang

Author of issue analysis and fix proposal: Steven Webster
@filipe-norte-red filipe-norte-red force-pushed the wpe-2.38-fix-bmalloc-hang-with-rt-thread-priorities branch from 9ac2855 to a170a0a Compare September 25, 2024 11:49
@eocanha eocanha added the upstream Related to an upstream bug (or should be at some point) label Oct 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
upstream Related to an upstream bug (or should be at some point) wpe-2.38
Development

Successfully merging this pull request may close these issues.

3 participants