Skip to content

Commit f382df4

Browse files
committed
[DOC] WIP concurrency guide
1 parent c995faa commit f382df4

File tree

1 file changed

+106
-0
lines changed

1 file changed

+106
-0
lines changed

doc/concurrency_guide.md

Lines changed: 106 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,106 @@
1+
# Concurrency Guide
2+
3+
This is a guide to thinking about concurrency in the native cruby source code, whether that's
4+
contributing to Ruby by writing C or Rust. This doesn't touch on native extensions, only the core
5+
language. It will go over:
6+
7+
* How to use the VM lock, and what you can and can't do when you've acquired this lock.
8+
* What you can and can't do when you've acquired other native locks.
9+
* The difference between the VM lock and the GVL.
10+
* How to write code that is ractor safe.
11+
* What a VM barrier is and when to use it.
12+
* The lock hierarchy of some important locks.
13+
* How ruby interrupt handling works.
14+
* What happens when IO is performed through ruby.
15+
* The timer thread and what it's responsible for.
16+
17+
18+
## The VM Lock
19+
20+
There's only one VM lock and its for critical sections that can only be entered by one ractor at a time.
21+
Without ractors, the VM lock is useless. It does not stop all ractors from running, as ractors can run
22+
without trying to acquire this lock. If you're updating global (shared) data between ractors and aren't using
23+
atomics, you need to use this lock. When you take the VM lock, there are things you can and can't do during
24+
your (hopefully brief) critical section:
25+
26+
You can (as long as no other locks are also held before the VM lock):
27+
28+
* Create ruby objects, call `ruby_xmalloc`, etc.
29+
30+
* Raise exceptions or use `EC_JUMP_TAG`. The lock will automatically be unlocked depending on how far up
31+
the call stack you locked it, and how far you're jumping to.
32+
33+
* Check ruby interrupts. Since raising exceptions can pop ruby frames and popping ruby frames checks interrupts,
34+
you are allowed.
35+
36+
You can't:
37+
38+
* Context switch to another ruby thread or ractor. This is important, as many things can cause ruby-level context switches including:
39+
40+
* Calling any ruby method through, for example, `rb_funcall`. If you execute ruby code, a context switch could happen.
41+
This also applies to ruby methods defined in C, as they can be redefined in Ruby. Things that call ruby methods such as
42+
`rb_obj_respond_to` are also disallowed.
43+
44+
* Calling `rb_nogvl`.
45+
46+
* Enter any blocking operation managed by ruby. This will context switch to another ruby thread using `rb_nogvl` or
47+
something equivalent.
48+
49+
* Calling a ruby-level mechanism that can context switch, like `rb_mutex_lock`.
50+
51+
Internally, the VM lock is the `vm->ractor.sync.lock`.
52+
53+
## Other Locks
54+
55+
All native locks that aren't the VM lock share a more strict set of rules for what's allowed during the critical section. By native locks, we mean
56+
anything that uses `rb_native_mutex_lock`. Some important locks include the `interrupt_lock`, the ractor scheduling lock (global), the thread
57+
scheduling lock (local to each ractor) and the ractor lock (local to each ractor).
58+
59+
You can:
60+
61+
* Allocate memory though non-ruby allocation such as raw `malloc` or the standard library. But be careful, some functions like `strdup` use
62+
ruby allocation through the use of macros!
63+
64+
* Use `ccan` lists, as they don't allocate.
65+
66+
* Do the usual things like set variables or struct fields, manipulate linked lists, etc.
67+
68+
You can't:
69+
70+
* Raise exceptions or use `EC_JUMP_TAG` if it jumps out of the critical section.
71+
72+
* Context switch. See the `VM Lock` section for more info.
73+
74+
* Check interrupts. Doing so can raise exceptions or context switch.
75+
76+
* Allocate memory through ruby. This includes creating ruby objects or using `ruby_xmalloc` or `st_insert`. The reason this
77+
is disallowed is if that allocation causes a GC, then all other ruby threads must join a VM barrier as soon as possible
78+
(when they next check interrupts or acquire the VM lock). This is so that no other ractors are running during GC. If a ruby thread
79+
is waiting (blocked) on this same native lock, it can't join the barrier and a deadlock occurs.
80+
81+
## Difference Between VM Lock and GVL
82+
83+
The VM Lock is a particular lock in the source code. There is only one VM Lock. The GVL, on the other hand, is more of a combination of locks.
84+
It is "acquired" when a ruby thread is about to run or is running. Since many ruby threads can run at the same time if they're in different ractors,
85+
there are many GVLs (1 per `SNT` + 1 for the main ractor). It can no longer be thought of as a "Global VM Lock".
86+
87+
## How To Write Ractor-Safe Code
88+
89+
Before ractors, only one ruby thread could run at once. That didn't mean you could forget about concurrency issues, though. Context switches happen
90+
often and need to be taken into account when writing code. Also, threads without the GVL run too, like the timer thread. Sometimes these threads need
91+
to coordinate with ruby threads, and this coordination often needs locks or atomics.
92+
93+
When you add ractors to the mix, it gets more complicated. Take the `fstring` table, for example. It's a global set of strings that each ractor can update
94+
concurrently, and it's used heavily. A lockless solution is preferred to using the VM lock in this case, as taking the VM Lock would cause too many OS context
95+
switches. A lockless solution is also preferable for dealing with call cache tables on classes. These are also updated often and can run from multiple ractors
96+
concurrently. Here, an RCU (Read-Copy-Update) solution is used. What was previously an `st_table` is now a ruby object, and the old and new tables are switched
97+
atomically.
98+
99+
## VM Barriers
100+
101+
Sometimes, taking the VM Lock isn't enough and you need a guarantee that all ractors have stopped. This happens when running GC, for instance.
102+
A VM barrier is designed for this use case. It's not used often as taking the barrier slows ractor performance down a lot, but it's useful to
103+
know about and is sometimes the only solution.
104+
105+
## Lock Hierarchy
106+

0 commit comments

Comments
 (0)