summaryrefslogtreecommitdiff
path: root/doc/contributing/concurrency_guide.md
diff options
context:
space:
mode:
Diffstat (limited to 'doc/contributing/concurrency_guide.md')
-rw-r--r--doc/contributing/concurrency_guide.md154
1 files changed, 154 insertions, 0 deletions
diff --git a/doc/contributing/concurrency_guide.md b/doc/contributing/concurrency_guide.md
new file mode 100644
index 0000000000..1fb58f7203
--- /dev/null
+++ b/doc/contributing/concurrency_guide.md
@@ -0,0 +1,154 @@
+# Concurrency Guide
+
+This is a guide to thinking about concurrency in the cruby source code, whether that's contributing to Ruby
+by writing C or by contributing to one of the JITs. This does not touch on native extensions, only the core
+language. It will go over:
+
+* What needs synchronizing?
+* How to use the VM lock, and what you can and can't do when you've acquired this lock.
+* What you can and can't do when you've acquired other native locks.
+* The difference between the VM lock and the GVL.
+* What a VM barrier is and when to use it.
+* The lock ordering of some important locks.
+* How ruby interrupt handling works.
+* The timer thread and what it's responsible for.
+
+## What needs synchronizing?
+
+Before ractors, only one ruby thread could run at once. That didn't mean you could forget about concurrency issues, though. The timer thread
+is a native thread that interacts with other ruby threads and changes some VM internals, so if these changes can be done in parallel by both the timer
+thread and a ruby thread, they need to be synchronized.
+
+When you add ractors to the mix, it gets more complicated. However, ractors allow you to forget about synchronization for non-shareable objects because
+they aren't used across ractors. Only one ruby thread can touch the object at once. For shareable objects, they are deeply frozen so there isn't any
+mutation on the objects themselves. However, something like reading/writing constants across ractors does need to be synchronized. In this case, ruby threads need to
+see a consistent view of the VM. If publishing the update takes 2 steps or even two separate instructions, like in this case, synchronization is required.
+
+Most synchronization is to protect VM internals. These internals include structures for the thread scheduler on each ractor, the global ractor scheduler, the
+coordination between ruby threads and ractors, global tables (for `fstrings`, encodings, symbols and global vars), etc. Anything that can be mutated by a ractor
+that can also be read or mutated by another ractor at the same time requires proper synchronization.
+
+## The VM Lock
+
+There's only one VM lock and it is for critical sections that can only be entered by one ractor at a time.
+Without ractors, the VM lock is useless. It does not stop all ractors from running, as ractors can run
+without trying to acquire this lock. If you're updating global (shared) data between ractors and aren't using
+atomics, you need to use a lock and this is a convenient one to use. Unlike other locks, you can allocate ruby-managed
+memory with it held. When you take the VM lock, there are things you can and can't do during your critical section:
+
+You can (as long as no other locks are also held before the VM lock):
+
+* Create ruby objects, call `ruby_xmalloc`, etc.
+
+You can't:
+
+* Context switch to another ruby thread or ractor. This is important, as many things can cause ruby-level context switches including:
+
+ * Calling any ruby method through, for example, `rb_funcall`. If you execute ruby code, a context switch could happen.
+ This also applies to ruby methods defined in C, as they can be redefined in Ruby. Things that call ruby methods such as
+ `rb_obj_respond_to` are also disallowed.
+
+ * Calling `rb_raise`. This will call `initialize` on the new exception object. With the VM lock
+ held, nothing you call should be able to raise an exception. `NoMemoryError` is allowed, however.
+
+ * Calling `rb_nogvl` or a ruby-level mechanism that can context switch like `rb_mutex_lock`.
+
+ * Enter any blocking operation managed by ruby. This will context switch to another ruby thread using `rb_nogvl` or
+ something equivalent. A blocking operation is one that blocks the thread's progress, such as `sleep` or `IO#read`.
+
+Internally, the VM lock is the `vm->ractor.sync.lock`.
+
+You need to be on a ruby thread to take the VM lock. You also can't take it inside any functions that could be called during sweeping, as MMTK sweeps
+on another thread and you need a valid `ec` to grab the lock. For this same reason (among others), you can't take it from the timer thread either.
+
+## Other Locks
+
+All native locks that aren't the VM lock share a more strict set of rules for what's allowed during the critical section. By native locks, we mean
+anything that uses `rb_native_mutex_lock`. Some important locks include the `interrupt_lock`, the ractor scheduling lock (protects global scheduling data structures),
+the thread scheduling lock (local to each ractor, protects per-ractor scheduling data structures) and the ractor lock (local to each ractor, protects ractor data structures).
+
+When you acquire one of these locks,
+
+You can:
+
+* Allocate memory though non-ruby allocation such as raw `malloc` or the standard library. But be careful, some functions like `strdup` use
+ruby allocation through the use of macros!
+
+* Use `ccan` lists, as they don't allocate.
+
+* Do the usual things like set variables or struct fields, manipulate linked lists, signal condition variables etc.
+
+You can't:
+
+* Allocate ruby-managed memory. This includes creating ruby objects or using `ruby_xmalloc` or `st_insert`. The reason this
+is disallowed is if that allocation causes a GC, then all other ruby threads must join a VM barrier as soon as possible
+(when they next check interrupts or acquire the VM lock). This is so that no other ractors are running during GC. If a ruby thread
+is waiting (blocked) on this same native lock, it can't join the barrier and a deadlock occurs because the barrier will never finish.
+
+* Raise exceptions. You also can't use `EC_JUMP_TAG` if it jumps out of the critical section.
+
+* Context switch. See the `VM Lock` section for more info.
+
+## Difference Between VM Lock and GVL
+
+The VM Lock is a particular lock in the source code. There is only one VM Lock. The GVL, on the other hand, is more of a combination of locks.
+It is "acquired" when a ruby thread is about to run or is running. Since many ruby threads can run at the same time if they're in different ractors,
+there are many GVLs (1 per `SNT` + 1 for the main ractor). It can no longer be thought of as a "Global VM Lock" like it once was before ractors.
+
+## VM Barriers
+
+Sometimes, taking the VM Lock isn't enough and you need a guarantee that all ractors have stopped. This happens when running `GC`, for instance.
+To get a barrier, you take the VM Lock and call `rb_vm_barrier()`. For the duration that the VM lock is held, no other ractors will be running. It's not used
+often as taking a barrier slows ractor performance down considerably, but it's useful to know about and is sometimes the only solution.
+
+## Lock Orderings
+
+It's a good idea to not hold more than 2 locks at once on the same thread. Locking multiple locks can introduce deadlocks, so do it with care. When locking
+multiple locks at once, follow an ordering that is consistent across the program, otherwise you can introduce deadlocks. Here are the orderings of some important locks:
+
+* VM lock before ractor_sched_lock
+* thread_sched_lock before ractor_sched_lock
+* interrupt_lock before timer_th.waiting_lock
+* timer_th.waiting_lock before ractor_sched_lock
+
+These orderings are subject to change, so check the source if you're not sure. On top of this:
+
+* During each `ubf` (unblock) function, the VM lock can be taken around it in some circumstances. This happens during VM shutdown, for example.
+See the "Interrupt Handling" section for more details.
+
+## Ruby Interrupt Handling
+
+When the VM runs ruby code, ruby's threads intermittently check ruby-level interrupts. These software interrupts
+are for various things in ruby and they can be set by other ruby threads or the timer thread.
+
+* Ruby threads check when they should give up their timeslice. The native thread switches to another ruby thread when their time is up.
+* The timer thread sends a "trap" interrupt to the main thread if any ruby-level signal handlers are pending.
+* Ruby threads can have other ruby threads run tasks for them by sending them an interrupt. For instance, ractors send
+the main thread an interrupt when they need to `require` a file so that it's done on the main thread. They wait for the
+main thread's result.
+* During VM shutdown, a "terminate" interrupt is sent to all ractor main threads top stop them asap.
+* When calling `Thread#raise`, the caller sends an interrupt to that thread telling it which exception to raise.
+* Unlocking a mutex sends the next waiter (if any) an interrupt telling it to grab the lock.
+* Signalling or broadcasting on a condition variable tells the waiter(s) to wake up.
+
+This isn't a complete list.
+
+When sending an interrupt to a ruby thread, the ruby thread can be blocked. For example, it could be in the middle of a `TCPSocket#read` call. If so,
+the receiving thread's `ubf` (unblock function) gets called from the thread (ruby thread or timer thread) that sent the interrupt.
+Each ruby thread has a `ubf` that is set when it enters a blocking operation and is unset after returning from it. By default, this `ubf` function sends a
+`SIGVTALRM` to the receiving thread to try to unblock it from the kernel so it can check its interrupts. There are other `ubfs` that
+aren't associated with a syscall, such as when calling `Ractor#join` or `sleep`. All `ubfs` are called with the `interrupt_lock` held,
+so take that into account when using locks inside `ubfs`.
+
+Remember, `ubfs` can be called from the timer thread so you cannot assume an `ec` inside them. The `ec` (execution context) is only set on ruby threads.
+
+## The Timer Thread
+
+The timer thread has a few functions. They are:
+
+* Send interrupts to ruby threads that have run for their whole timeslice.
+* Wake up M:N ruby threads (threads in non-main ractors) blocked on IO or after a specified timeout. This
+uses `kqueue` or `epoll`, depending on the OS, to receive IO events on behalf of the threads.
+* Continue calling the `SIGVTARLM` signal if a thread is still blocked on a syscall after the first `ubf` call.
+* Signal native threads (`SNT`) waiting on a ractor if there are ractors waiting in the global run queue.
+* Create more `SNT`s if some are blocked, like on IO or on `Ractor#join`.