|
| 1 | +# Concurrency Guide |
| 2 | + |
| 3 | +This is a guide to thinking about concurrency in the native cruby source code, whether that's |
| 4 | +contributing to Ruby by writing C or Rust. This doesn't touch on native extensions, only the core |
| 5 | +language. It will go over: |
| 6 | + |
| 7 | +* How to use the VM lock, and what you can and can't do when you've acquired this lock. |
| 8 | +* What you can and can't do when you've acquired other native locks. |
| 9 | +* The difference between the VM lock and the GVL. |
| 10 | +* How to write code that is ractor safe. |
| 11 | +* What a VM barrier is and when to use it. |
| 12 | +* The lock hierarchy of some important locks. |
| 13 | +* How ruby interrupt handling works. |
| 14 | +* What happens when IO is performed through ruby. |
| 15 | +* The timer thread and what it's responsible for. |
| 16 | + |
| 17 | + |
| 18 | +## The VM Lock |
| 19 | + |
| 20 | +There's only one VM lock and its for critical sections that can only be entered by one ractor at a time. |
| 21 | +Without ractors, the VM lock is useless. It does not stop all ractors from running, as ractors can run |
| 22 | +without trying to acquire this lock. If you're updating global (shared) data between ractors and aren't using |
| 23 | +atomics, you need to use this lock. When you take the VM lock, there are things you can and can't do during |
| 24 | +your (hopefully brief) critical section: |
| 25 | + |
| 26 | +You can (as long as no other locks are also held before the VM lock): |
| 27 | + |
| 28 | +* Create ruby objects, call `ruby_xmalloc`, etc. |
| 29 | + |
| 30 | +* Raise exceptions or use `EC_JUMP_TAG`. The lock will automatically be unlocked depending on how far up |
| 31 | +the call stack you locked it, and how far you're jumping to. |
| 32 | + |
| 33 | +* Check ruby interrupts. Since raising exceptions can pop ruby frames and popping ruby frames checks interrupts, |
| 34 | +you are allowed. |
| 35 | + |
| 36 | +You can't: |
| 37 | + |
| 38 | +* Context switch to another ruby thread or ractor. This is important, as many things can cause ruby-level context switches including: |
| 39 | + |
| 40 | + * Calling any ruby method through, for example, `rb_funcall`. If you execute ruby code, a context switch could happen. |
| 41 | + This also applies to ruby methods defined in C, as they can be redefined in Ruby. Things that call ruby methods such as |
| 42 | + `rb_obj_respond_to` are also disallowed. |
| 43 | + |
| 44 | + * Calling `rb_nogvl`. |
| 45 | + |
| 46 | + * Enter any blocking operation managed by ruby. This will context switch to another ruby thread using `rb_nogvl` or |
| 47 | + something equivalent. |
| 48 | + |
| 49 | + * Calling a ruby-level mechanism that can context switch, like `rb_mutex_lock`. |
| 50 | + |
| 51 | +Internally, the VM lock is the `vm->ractor.sync.lock`. |
| 52 | + |
| 53 | +## Other Locks |
| 54 | + |
| 55 | +All native locks that aren't the VM lock share a more strict set of rules for what's allowed during the critical section. By native locks, we mean |
| 56 | +anything that uses `rb_native_mutex_lock`. Some important locks include the `interrupt_lock`, the ractor scheduling lock (global), the thread |
| 57 | +scheduling lock (local to each ractor) and the ractor lock (local to each ractor). |
| 58 | + |
| 59 | +You can: |
| 60 | + |
| 61 | +* Allocate memory though non-ruby allocation such as raw `malloc` or the standard library. But be careful, some functions like `strdup` use |
| 62 | +ruby allocation through the use of macros! |
| 63 | + |
| 64 | +* Use `ccan` lists, as they don't allocate. |
| 65 | + |
| 66 | +* Do the usual things like set variables or struct fields, manipulate linked lists, etc. |
| 67 | + |
| 68 | +You can't: |
| 69 | + |
| 70 | +* Raise exceptions or use `EC_JUMP_TAG` if it jumps out of the critical section. |
| 71 | + |
| 72 | +* Context switch. See the `VM Lock` section for more info. |
| 73 | + |
| 74 | +* Check interrupts. Doing so can raise exceptions or context switch. |
| 75 | + |
| 76 | +* Allocate memory through ruby. This includes creating ruby objects or using `ruby_xmalloc` or `st_insert`. The reason this |
| 77 | +is disallowed is if that allocation causes a GC, then all other ruby threads must join a VM barrier as soon as possible |
| 78 | +(when they next check interrupts or acquire the VM lock). This is so that no other ractors are running during GC. If a ruby thread |
| 79 | +is waiting (blocked) on this same native lock, it can't join the barrier and a deadlock occurs. |
| 80 | + |
| 81 | +## Difference Between VM Lock and GVL |
| 82 | + |
| 83 | +The VM Lock is a particular lock in the source code. There is only one VM Lock. The GVL, on the other hand, is more of a combination of locks. |
| 84 | +It is "acquired" when a ruby thread is about to run or is running. Since many ruby threads can run at the same time if they're in different ractors, |
| 85 | +there are many GVLs (1 per `SNT` + 1 for the main ractor). It can no longer be thought of as a "Global VM Lock". |
| 86 | + |
| 87 | +## How To Write Ractor-Safe Code |
| 88 | + |
| 89 | +Before ractors, only one ruby thread could run at once. That didn't mean you could forget about concurrency issues, though. Context switches happen |
| 90 | +often and need to be taken into account when writing code. Also, threads without the GVL run too, like the timer thread. Sometimes these threads need |
| 91 | +to coordinate with ruby threads, and this coordination often needs locks or atomics. |
| 92 | + |
| 93 | +When you add ractors to the mix, it gets more complicated. Take the `fstring` table, for example. It's a global set of strings that each ractor can update |
| 94 | +concurrently, and it's used heavily. A lockless solution is preferred to using the VM lock in this case, as taking the VM Lock would cause too many OS context |
| 95 | +switches. A lockless solution is also preferable for dealing with call cache tables on classes. These are also updated often and can run from multiple ractors |
| 96 | +concurrently. Here, an RCU (Read-Copy-Update) solution is used. What was previously an `st_table` is now a ruby object, and the old and new tables are switched |
| 97 | +atomically. |
| 98 | + |
| 99 | +## VM Barriers |
| 100 | + |
| 101 | +Sometimes, taking the VM Lock isn't enough and you need a guarantee that all ractors have stopped. This happens when running GC, for instance. |
| 102 | +A VM barrier is designed for this use case. It's not used often as taking the barrier slows ractor performance down a lot, but it's useful to |
| 103 | +know about and is sometimes the only solution. |
| 104 | + |
| 105 | +## Lock Hierarchy |
| 106 | + |
0 commit comments