Skip to content

Commit 65432be

Browse files
committed
[DOC] WIP concurrency guide
1 parent c995faa commit 65432be

File tree

1 file changed

+99
-0
lines changed

1 file changed

+99
-0
lines changed

doc/concurrency_guide.md

Lines changed: 99 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,99 @@
1+
# Concurrency Guide
2+
3+
This is a guide to thinking about concurrency in the native cruby source code, whether that's
4+
contributing to Ruby by writing C or Rust. This doesn't touch on native extensions, only the core
5+
language. It will go over:
6+
7+
* How to use the VM lock, and what you can and can't do when you've acquired this lock.
8+
* What you can and can't do when you've acquired other native locks.
9+
* The difference between the VM lock and the GVL.
10+
* How to write code that is ractor safe.
11+
* What a VM barrier is and when to use it.
12+
* The lock hierarchy of some important locks.
13+
* How ruby interrupt handling works.
14+
* What happens when IO is performed through ruby.
15+
* The timer thread and what it's responsible for.
16+
17+
18+
## The VM Lock
19+
20+
There's only one VM lock and its for critical sections that can only be entered by one ractor at a time.
21+
Without ractors, the VM lock is useless. It does not stop all ractors from running, as ractors can run
22+
without trying to acquire this lock. If you're updating global (shared) data between ractors and aren't using
23+
atomics, you need to use this lock. When you take the VM lock, there are things you can and can't do during
24+
your critical section:
25+
26+
You can (as long as no other locks are also held before the VM lock):
27+
28+
* Create ruby objects, call `ruby_xmalloc`, etc.
29+
30+
You can't:
31+
32+
* Context switch to another ruby thread or ractor. This is important, as many things can cause ruby-level context switches including:
33+
34+
* Calling any ruby method through, for example, `rb_funcall`. If you execute ruby code, a context switch could happen.
35+
This also applies to ruby methods defined in C, as they can be redefined in Ruby. Things that call ruby methods such as
36+
`rb_obj_respond_to` are also disallowed.
37+
38+
* Calling `rb_raise`. This will call `initialize` on the new exception object. With the VM lock
39+
held, nothing you call should be able to raise an exception. `NoMemoryError` is allowed, however.
40+
41+
* Calling `rb_nogvl` or a ruby-level mechanism that can context switch like `rb_mutex_lock`.
42+
43+
* Enter any blocking operation managed by ruby. This will context switch to another ruby thread using `rb_nogvl` or
44+
something equivalent.
45+
46+
Internally, the VM lock is the `vm->ractor.sync.lock`.
47+
48+
## Other Locks
49+
50+
All native locks that aren't the VM lock share a more strict set of rules for what's allowed during the critical section. By native locks, we mean
51+
anything that uses `rb_native_mutex_lock`. Some important locks include the `interrupt_lock`, the ractor scheduling lock (global), the thread
52+
scheduling lock (local to each ractor) and the ractor lock (local to each ractor).
53+
54+
You can:
55+
56+
* Allocate memory though non-ruby allocation such as raw `malloc` or the standard library. But be careful, some functions like `strdup` use
57+
ruby allocation through the use of macros!
58+
59+
* Use `ccan` lists, as they don't allocate.
60+
61+
* Do the usual things like set variables or struct fields, manipulate linked lists, etc.
62+
63+
You can't:
64+
65+
* Allocate ruby-managed memory. This includes creating ruby objects or using `ruby_xmalloc` or `st_insert`. The reason this
66+
is disallowed is if that allocation causes a GC, then all other ruby threads must join a VM barrier as soon as possible
67+
(when they next check interrupts or acquire the VM lock). This is so that no other ractors are running during GC. If a ruby thread
68+
is waiting (blocked) on this same native lock, it can't join the barrier and a deadlock occurs because the barrier will never finish.
69+
70+
* Raise exceptions or use `EC_JUMP_TAG` if it jumps out of the critical section.
71+
72+
* Context switch. See the `VM Lock` section for more info.
73+
74+
## Difference Between VM Lock and GVL
75+
76+
The VM Lock is a particular lock in the source code. There is only one VM Lock. The GVL, on the other hand, is more of a combination of locks.
77+
It is "acquired" when a ruby thread is about to run or is running. Since many ruby threads can run at the same time if they're in different ractors,
78+
there are many GVLs (1 per `SNT` + 1 for the main ractor). It can no longer be thought of as a "Global VM Lock".
79+
80+
## How To Write Ractor-Safe Code
81+
82+
Before ractors, only one ruby thread could run at once. That didn't mean you could forget about concurrency issues, though. Context switches happen
83+
often and need to be taken into account when writing code. Also, threads without the GVL run too, like the timer thread. Sometimes these threads need
84+
to coordinate with ruby threads, and this coordination often needs locks or atomics.
85+
86+
When you add ractors to the mix, it gets more complicated. Take the `fstring` table, for example. It's a global set of strings that each ractor can update
87+
concurrently, and it's used heavily. A lockless solution is preferred to using the VM lock in this case, as taking the VM Lock would cause too many OS context
88+
switches. A lockless solution is also preferable for dealing with call cache tables on classes. These are also updated often and can run from multiple ractors
89+
concurrently. Here, an RCU (Read-Copy-Update) solution is used. What was previously an `st_table` is now a ruby object, and the old and new tables are switched
90+
atomically.
91+
92+
## VM Barriers
93+
94+
Sometimes, taking the VM Lock isn't enough and you need a guarantee that all ractors have stopped. This happens when running GC, for instance.
95+
A VM barrier is designed for this use case. It's not used often as taking the barrier slows ractor performance down considerably, but it's useful to
96+
know about and is sometimes the only solution.
97+
98+
## Lock Hierarchy
99+

0 commit comments

Comments
 (0)