Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
99 changes: 99 additions & 0 deletions doc/concurrency_guide.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,99 @@
# Concurrency Guide

This is a guide to thinking about concurrency in the native cruby source code, whether that's
contributing to Ruby by writing C or Rust. This doesn't touch on native extensions, only the core
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If by Rust you mean YJIT & ZJIT, maybe we can be more specific here? Like `contributing to Ruby interpreter and its JIT compilers (YJIT & ZJIT).

language. It will go over:

* How to use the VM lock, and what you can and can't do when you've acquired this lock.
* What you can and can't do when you've acquired other native locks.
* The difference between the VM lock and the GVL.
* How to write code that is ractor safe.
* What a VM barrier is and when to use it.
* The lock hierarchy of some important locks.
* How ruby interrupt handling works.
* What happens when IO is performed through ruby.
* The timer thread and what it's responsible for.


## The VM Lock

There's only one VM lock and its for critical sections that can only be entered by one ractor at a time.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
There's only one VM lock and its for critical sections that can only be entered by one ractor at a time.
There's only one VM lock and it's for critical sections that can only be entered by one ractor at a time.

Without ractors, the VM lock is useless. It does not stop all ractors from running, as ractors can run
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noob Q: what's the relationship/difference between the VM Lock mentioned here and GVL?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually explained in a different section.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thx I'm seeing it now. I think it's worth moving it forward as that's likely known by most Ruby devs looking at this doc, and it'd be great to avoid confusion between them as early as possible.

without trying to acquire this lock. If you're updating global (shared) data between ractors and aren't using
atomics, you need to use this lock. When you take the VM lock, there are things you can and can't do during
your critical section:

You can (as long as no other locks are also held before the VM lock):

* Create ruby objects, call `ruby_xmalloc`, etc.

You can't:

* Context switch to another ruby thread or ractor. This is important, as many things can cause ruby-level context switches including:

* Calling any ruby method through, for example, `rb_funcall`. If you execute ruby code, a context switch could happen.
This also applies to ruby methods defined in C, as they can be redefined in Ruby. Things that call ruby methods such as
`rb_obj_respond_to` are also disallowed.

* Calling `rb_raise`. This will call `initialize` on the new exception object. With the VM lock
held, nothing you call should be able to raise an exception. `NoMemoryError` is allowed, however.

* Calling `rb_nogvl` or a ruby-level mechanism that can context switch like `rb_mutex_lock`.

* Enter any blocking operation managed by ruby. This will context switch to another ruby thread using `rb_nogvl` or
something equivalent.

Internally, the VM lock is the `vm->ractor.sync.lock`.

## Other Locks

All native locks that aren't the VM lock share a more strict set of rules for what's allowed during the critical section. By native locks, we mean
anything that uses `rb_native_mutex_lock`. Some important locks include the `interrupt_lock`, the ractor scheduling lock (global), the thread
scheduling lock (local to each ractor) and the ractor lock (local to each ractor).

You can:

* Allocate memory though non-ruby allocation such as raw `malloc` or the standard library. But be careful, some functions like `strdup` use
ruby allocation through the use of macros!

* Use `ccan` lists, as they don't allocate.

* Do the usual things like set variables or struct fields, manipulate linked lists, etc.

You can't:

* Allocate ruby-managed memory. This includes creating ruby objects or using `ruby_xmalloc` or `st_insert`. The reason this
is disallowed is if that allocation causes a GC, then all other ruby threads must join a VM barrier as soon as possible
(when they next check interrupts or acquire the VM lock). This is so that no other ractors are running during GC. If a ruby thread
is waiting (blocked) on this same native lock, it can't join the barrier and a deadlock occurs because the barrier will never finish.

* Raise exceptions or use `EC_JUMP_TAG` if it jumps out of the critical section.

* Context switch. See the `VM Lock` section for more info.

## Difference Between VM Lock and GVL

The VM Lock is a particular lock in the source code. There is only one VM Lock. The GVL, on the other hand, is more of a combination of locks.
It is "acquired" when a ruby thread is about to run or is running. Since many ruby threads can run at the same time if they're in different ractors,
there are many GVLs (1 per `SNT` + 1 for the main ractor). It can no longer be thought of as a "Global VM Lock".

## How To Write Ractor-Safe Code

Before ractors, only one ruby thread could run at once. That didn't mean you could forget about concurrency issues, though. Context switches happen
often and need to be taken into account when writing code. Also, threads without the GVL run too, like the timer thread. Sometimes these threads need
to coordinate with ruby threads, and this coordination often needs locks or atomics.

When you add ractors to the mix, it gets more complicated. Take the `fstring` table, for example. It's a global set of strings that each ractor can update
concurrently, and it's used heavily. A lockless solution is preferred to using the VM lock in this case, as taking the VM Lock would cause too many OS context
switches. A lockless solution is also preferable for dealing with call cache tables on classes. These are also updated often and can run from multiple ractors
concurrently. Here, an RCU (Read-Copy-Update) solution is used. What was previously an `st_table` is now a ruby object, and the old and new tables are switched
atomically.

## VM Barriers

Sometimes, taking the VM Lock isn't enough and you need a guarantee that all ractors have stopped. This happens when running GC, for instance.
A VM barrier is designed for this use case. It's not used often as taking the barrier slows ractor performance down considerably, but it's useful to
know about and is sometimes the only solution.

## Lock Hierarchy

4 changes: 3 additions & 1 deletion ractor.c
Original file line number Diff line number Diff line change
Expand Up @@ -2220,6 +2220,8 @@ ractor_local_value_set(rb_execution_context_t *ec, VALUE self, VALUE sym, VALUE
tbl = cr->idkey_local_storage = rb_id_table_create(2);
}
rb_id_table_insert(tbl, id, val);
RB_OBJ_WRITTEN(self, Qundef, val);

return val;
}

Expand Down Expand Up @@ -2267,7 +2269,7 @@ ractor_local_value_store_if_absent(rb_execution_context_t *ec, VALUE self, VALUE
}

if (!cr->local_storage_store_lock) {
cr->local_storage_store_lock = rb_mutex_new();
RB_OBJ_WRITE(self, &cr->local_storage_store_lock, rb_mutex_new());
}

return rb_mutex_synchronize(cr->local_storage_store_lock, ractor_local_value_store_i, (VALUE)&data);
Expand Down
Loading