ruby.git - The Ruby Programming Language

Age	Commit message (Collapse)	Author
3 days	merge revision(s) 7e81bf5c0c8f43602e6d901f4253dca2f3d71745: [Backport #21812]	Takashi Kokubun
	[PATCH] Fix sleep spurious wakeup from sigchld (#15802) When sleeping with `sleep`, currently the main thread can get woken up from sigchld from any thread (subprocess exited). The timer thread wakes up the main thread when this happens, as it checks for signals. The main thread then executes the ruby sigchld handler if one is registered and is supposed to go back to sleep immediately. This is not ideal but it's the way it's worked for a while. In commit 8d8159e7d8 I added writes to `th->status` before and after `wait_running_turn` in `thread_sched_to_waiting_until_wakeup`, which is called from `sleep`. This is usually the right way to set the thread's status, but `sleep` is an exception because the writes to `th->status` are done in `sleep_forever`. There's a loop that checks `th->status` in `sleep_forever`. When the main thread got woken up from sigchld it saw the changed `th->status` and continued to run the main thread instead of going back to sleep. The following script shows the error. It was returning instead of sleeping forever. ```ruby t = Thread.new do sleep 0.3 `echo hello` # Spawns subprocess puts "Subprocess exited" end puts "Main thread sleeping..." result = sleep # Should block forever puts "sleep returned: #{result.inspect}" ``` Fixes [Bug #21812]
2025-12-12	move th->event_serial to rb_thread_sched_item (#15500)	Luke Gruber

2025-12-12	Simplify the code	Koichi Sasada
	`thread_sched_to_waiting_common0` is no longer needed.
2025-12-10	Fix typo in thread_pthread.c [ci skip] (#15465)	Yuji Teshima
	Fix typo in thread_pthread.c: threre -> there
2025-12-04	Fix thread scheduler issue with thread_sched_wait_events (#15392)	Luke Gruber
	Fix race between timer thread dequeuing waiting thread and thread skipping sleeping due to being dequeued. We now use `th->event_serial` which is protected by `thread_sched_lock`. When a thread is put on timer thread's waiting list, the event serial is saved on the item. The timer thread checks that the saved serial is the same as current thread's serial before calling `thread_sched_to_ready`. The following script (taken from a test in `test_thread.rb` used to crash on scheduler debug assertions. It would likely crash in non-debug mode as well. ```ruby def assert_nil(val) if val != nil raise "Expected #{val} to be nil" end end def assert_equal(expected, actual) if expected != actual raise "Expected #{expected} to be #{actual}" end end def test_join2 ok = false t1 = Thread.new { ok = true; sleep } Thread.pass until ok Thread.pass until t1.stop? t2 = Thread.new do Thread.pass while ok t1.join(0.01) end t3 = Thread.new do ok = false t1.join end assert_nil(t2.value) t1.wakeup assert_equal(t1, t3.value) ensure t1&.kill&.join t2&.kill&.join t3&.kill&.join end rs = 30.times.map do Ractor.new do test_join2 end end rs.each(&:join) ```
2025-11-10	Fix `thread_sched_wait_events` race (#15067)	Luke Gruber
	This race condition was found when calling `Thread#join` with a timeout inside a ractor. The race is between the polling thread waking up the thread and the `ubf` getting called (`ubf_event_waiting`). The error was that the ubf or polling thread would set the thread as ready, but then the other function would do the same. Fixes [Bug #21614]
2025-10-30	mn timer thread: force wakeups for timeouts	Andre Muta

2025-10-07	Add debug #define to call sched_yield before each pthread_mutex_lock	Luke Gruber
	This is useful for debugging mutex issues as it increases contention for locks. It is off by default.
2025-09-30	Set context_stack on main thread	Peter Zhu
	We allocate the stack of the main thread using malloc, but we never set malloc_stack to true and context_stack. If we fork, the main thread may no longer be the main thread anymore so it reports memory being leaked in RUBY_FREE_AT_EXIT. This commit allows the main thread to free its own VM stack at shutdown.
2025-08-08	Fix lock ordering issue for rb_ractor_sched_wait() and rb_ractor_sched_wakeup()	Luke Gruber
	In rb_ractor_sched_wait() (ex: Ractor.receive), we acquire RACTOR_LOCK(cr) and then thread_sched_lock(cur_th). However, on wakeup if we're a dnt, in thread_sched_wait_running_turn() we acquire thread_sched_lock(cur_th) after condvar wakeup and then RACTOR_LOCK(cr). This lock inversion can cause a deadlock with rb_ractor_wakeup_all() (ex: port.send(obj)), where we acquire RACTOR_LOCK(other_r) and then thread_sched_lock(other_th). So, the error happens: nt 1: Ractor.receive rb_ractor_sched_wait() after condvar wakeup in thread_sched_wait_running_turn(): - thread_sched_lock(cur_th) (condvar) # acquires lock - rb_ractor_lock_self(cr) # deadlock here: tries to acquire, HANGS nt 2: port.send ractor_wakeup_all() - RACTOR_LOCK(port_r) # acquires lock - thread_sched_lock # tries to acquire, HANGS To fix it, we now unlock the thread_sched_lock before acquiring the ractor_lock in rb_ractor_sched_wait(). Script that reproduces issue: ```ruby require "async" class RactorWrapper def initialize @ractor = Ractor.new do Ractor.recv # Ractor doesn't start until explicitly told to # Do some calculations fib = ->(x) { x < 2 ? 1 : fib.call(x - 1) + fib.call(x - 2) } fib.call(20) end end def take_async @ractor.send(nil) Thread.new { @ractor.value }.value end end Async do \|task\| 10_000.times do \|i\| task.async do RactorWrapper.new.take_async puts i end end end exit 0 ``` Fixes [Bug #21398] Co-authored-by: John Hawthorn <john.hawthorn@shopify.com>
2025-06-12	Free rb_native_thread memory at fork	Peter Zhu
	We never freed any resources of rb_native_thread at fork because it would cause it to hang. This is because it called rb_native_cond_destroy for condition variables. We can't call rb_native_cond_destroy here because according to the specs of pthread_cond_destroy: Attempting to destroy a condition variable upon which other threads are currently blocked results in undefined behavior. Specifically, glibc's pthread_cond_destroy waits on all the other listeners. Since after forking all the threads are dead, the condition variable's listeners will never wake up, so it will hang forever. This commit changes it to only free the memory and none of the condition variables. Notes: Merged: https://github.com/ruby/ruby/pull/13591
2025-05-31	`Ractor::Port`	Koichi Sasada
	* Added `Ractor::Port` * `Ractor::Port#receive` (support multi-threads) * `Rcator::Port#close` * `Ractor::Port#closed?` * Added some methods * `Ractor#join` * `Ractor#value` * `Ractor#monitor` * `Ractor#unmonitor` * Removed some methods * `Ractor#take` * `Ractor.yield` * Change the spec * `Racotr.select` You can wait for multiple sequences of messages with `Ractor::Port`. ```ruby ports = 3.times.map{ Ractor::Port.new } ports.map.with_index do \|port, ri\| Ractor.new port,ri do \|port, ri\| 3.times{\|i\| port << "r#{ri}-#{i}"} end end p ports.each{\|port\| pp 3.times.map{port.receive}} ``` In this example, we use 3 ports, and 3 Ractors send messages to them respectively. We can receive a series of messages from each port. You can use `Ractor#value` to get the last value of a Ractor's block: ```ruby result = Ractor.new do heavy_task() end.value ``` You can wait for the termination of a Ractor with `Ractor#join` like this: ```ruby Ractor.new do some_task() end.join ``` `#value` and `#join` are similar to `Thread#value` and `Thread#join`. To implement `#join`, `Ractor#monitor` (and `Ractor#unmonitor`) is introduced. This commit changes `Ractor.select()` method. It now only accepts ports or Ractors, and returns when a port receives a message or a Ractor terminates. We removes `Ractor.yield` and `Ractor#take` because: * `Ractor::Port` supports most of similar use cases in a simpler manner. * Removing them significantly simplifies the code. We also change the internal thread scheduler code (thread_pthread.c): * During barrier synchronization, we keep the `ractor_sched` lock to avoid deadlocks. This lock is released by `rb_ractor_sched_barrier_end()` which is called at the end of operations that require the barrier. * fix potential deadlock issues by checking interrupts just before setting UBF. https://bugs.ruby-lang.org/issues/21262 Notes: Merged: https://github.com/ruby/ruby/pull/13445
2025-05-25	Use RB_VM_LOCKING	Nobuyoshi Nakada
	Notes: Merged: https://github.com/ruby/ruby/pull/13439
2025-05-15	Use atomics for system_working global	John Hawthorn
	Although it almost certainly works in this case, volatile is best not used for multi-threaded code. Using atomics instead avoids warnings from TSan. This also simplifies some logic, as system_working was previously only ever assigned to 1, so --system_working <= 0 should always return true (unless it underflowed). Notes: Merged: https://github.com/ruby/ruby/pull/13333
2025-05-15	Force reset running time in timer interrupt	John Hawthorn
	Co-authored-by: Ivo Anjo <ivo.anjo@datadoghq.com> Co-authored-by: Luke Gruber <luke.gru@gmail.com> Notes: Merged: https://github.com/ruby/ruby/pull/12094
2025-05-13	Get ractor message passing working with > 1 thread sending/receiving values ↵	Luke Gruber
	in same ractor Rework ractors so that any ractor action (Ractor.receive, Ractor#send, Ractor.yield, Ractor#take, Ractor.select) will operate on the thread that called the action. It will put that thread to sleep if it's a blocking function and it needs to put it to sleep, and the awakening action (Ractor.yield, Ractor#send) will wake up the blocked thread. Before this change every blocking ractor action was associated with the ractor struct and its fields. If a ractor called Ractor.receive, its wait status was wait_receiving, and when another ractor calls r.send on it, it will look for that status in the ractor struct fields and wake it up. The problem was that what if 2 threads call blocking ractor actions in the same ractor. Imagine if 1 thread has called Ractor.receive and another r.take. Then, when a different ractor calls r.send on it, it doesn't know which ruby thread is associated to which ractor action, so what ruby thread should it schedule? This change moves some fields onto the ruby thread itself so that ruby threads are the ones that have ractor blocking statuses, and threads are then specifically scheduled when unblocked. Fixes [#17624] Fixes [#21037] Notes: Merged: https://github.com/ruby/ruby/pull/12633
2025-04-19	Fix style [ci skip]	Nobuyoshi Nakada

2025-04-17	Prefer `th->ec` for stack base/size. (#13101)	Samuel Williams
	Notes: Merged-By: ioquatix <samuel@codeotaku.com>
2025-03-25	Clear VM_CHECK lock info on fork	John Hawthorn
	We are resetting the actual lock so we should reset this information at the same time. Previously this caused an assertion to fail in debug mode. Notes: Merged: https://github.com/ruby/ruby/pull/12981
2025-02-13	[Feature #21116] Extract RJIT as a third-party gem	Nobuyoshi Nakada
	Notes: Merged: https://github.com/ruby/ruby/pull/12740
2024-12-24	Fix [Bug #20779] Dedicated native thread creation failed bug	Luke Gruber
	When a dedicated native thread fails to create (OS can't create more threads), don't add the ruby-level thread to the thread scheduler. If it's added, then next time the thread is scheduled to run it will sleep forever, waiting for a native thread that doesn't exist to pick it up. Notes: Merged: https://github.com/ruby/ruby/pull/12441
2024-12-12	Add an environment variable for controlling the default Thread quantum	Aaron Patterson
	This commit adds an environment variable `RUBY_THREAD_TIMESLICE` for specifying the default thread quantum in milliseconds. You can adjust this variable to tune throughput, which is especially useful on multithreaded systems that are mixing CPU bound work and IO bound work. The default quantum remains 100ms. [Feature #20861] Co-Authored-By: John Hawthorn <john@hawthorn.email> Notes: Merged: https://github.com/ruby/ruby/pull/11981
2024-11-08	support `require` in non-main Ractors	Koichi Sasada
	Many libraries should be loaded on the main ractor because of setting constants with unshareable objects and so on. This patch allows to call `requore` on non-main Ractors by asking the main ractor to call `require` on it. The calling ractor waits for the result of `require` from the main ractor. If the `require` call failed with some reasons, an exception objects will be deliverred from the main ractor to the calling ractor if it is copy-able. Same on `require_relative` and `require` by `autoload`. Now `Ractor.new{pp obj}` works well (the first call of `pp` requires `pp` library implicitly). [Feature #20627] Notes: Merged: https://github.com/ruby/ruby/pull/11142
2024-10-22	Delete reserve_stack code	KJ Tsanaktsidis
	This code was working around a bug in the Linux kernel. It was previously possible for the kernel to place heap pages in a region where the stack was allowed to grow into, and then therefore run out of usable stack memory before RLIMIT_STACK was reached. This bug was fixed in Linux commit https://github.com/torvalds/linux/commit/c204d21f2232d875e36b8774c36ffd027dc1d606 for kernel 4.13 in 2017. Therefore, in 2024, we should be safe to delete this workaround. [Bug #20804] Notes: Merged: https://github.com/ruby/ruby/pull/11927
2024-10-09	Fix spelling	John Bampton
	Notes: Merged: https://github.com/ruby/ruby/pull/11835
2024-09-29	Assertions should not have side effects	Nobuyoshi Nakada
	Notes: Merged: https://github.com/ruby/ruby/pull/11720
2024-09-19	Adjust indent [ci skip]	Nobuyoshi Nakada

2024-09-05	Proof of Concept: Allow to prevent fork from happening in known fork unsafe API	Jean Boussier
	[Feature #20590] For better of for worse, fork(2) remain the primary provider of parallelism in Ruby programs. Even though it's frowned uppon in many circles, and a lot of literature will simply state that only async-signal safe APIs are safe to use after `fork()`, in practice most APIs work well as long as you are careful about not forking while another thread is holding a pthread mutex. One of the APIs that is known cause fork safety issues is `getaddrinfo`. If you fork while another thread is inside `getaddrinfo`, a mutex may be left locked in the child, with no way to unlock it. I think we could reduce the impact of these problem by preventing in for the most notorious and common cases, by locking around `fork(2)` and known unsafe APIs with a read-write lock. Notes: Merged: https://github.com/ruby/ruby/pull/10864
2024-08-13	Re-initialize vm->ractor.sched.lock after fork	John Hawthorn
	Previously under certain conditions it was possible to encounter a deadlock in the forked child process if ractor.sched.lock was held. Co-authored-by: Nathan Froyd <froydnj@gmail.com> Notes: Merged: https://github.com/ruby/ruby/pull/11356
2024-07-09	fix last commit	Koichi Sasada
	`th` is gone.
2024-07-09	`struct rb_thread_sched_waiting`	Koichi Sasada
	Introduce `struct rb_thread_sched_waiting` and `timer_th.waiting` can contain other than `rb_thread_t`.
2024-07-05	VM barrier needs to store GC root	Koichi Sasada
	On the VM barrier waiting, it needs to store machine context as a GC root. Also it needs to wait for barrier synchronization correctly by `while` (for continuous barrier sync by another ractor). This is why GC marking misses are occerred on ARM machines. like: https://rubyci.s3.amazonaws.com/fedora40-arm/ruby-master/log/20240702T063002Z.fail.html.gz
2024-05-07	Ignore the result of pthread_kill in ubf_wakeup_thread	Jeremy Evans
	After an upgrade to Ruby 3.3.0, I experienced reproducible production crashes of the form: [BUG] pthread_kill: No such process (ESRCH) This is the only pthread_kill call in Ruby. The result of pthread_kill was previously ignored in Ruby 3.2 and below. Checking the result was added in be1bbd5b7d40ad863ab35097765d3754726bbd54 (MaNy). I have not yet been able to create a minimal self-contained example, but it should be safe to remove the checks.
2024-03-29	Use ubf list on cygwin	Daisuke Fujimura (fd0)

2024-03-25	Move asan_fake_stack_handle to EC, not thread	KJ Tsanaktsidis
	It's really a property of the EC; each fiber (which has its own EC) also has its own asan_fake_stack_handle. [Bug #20310]
2024-02-21	`rb_thread_lock_native_thread()`	Koichi Sasada
	Introduce `rb_thread_lock_native_thread()` to allocate dedicated native thread to the current Ruby thread for M:N threads. This C API is similar to Go's `runtime.LockOSThread()`. Accepted at https://github.com/ruby/dev-meeting-log/blob/master/2023/DevMeeting-2023-08-24.md (and missed to implement on Ruby 3.3.0)
2024-02-06	notify ASAN about M:N threading stack switches	KJ Tsanaktsidis
	In a similar way to how we do it with fibers in cont.c, we need to call __sanitize_start_switch_fiber and __sanitize_finish_switch_fiber around the call to coroutine_transfer to let ASAN save & restore the fake stack pointer. When a M:N thread is exiting, we pass `to_dead` to the new coroutine_transfer0 function, so that we can pass NULL for saving the stack pointer. This signals to ASAN that the fake stack can be freed (otherwise it would be leaked) [Bug #20220]
2024-01-23	Fix up [Bug #20001]	Nobuyoshi Nakada

2024-01-19	Remove null checks for xfree	Peter Zhu
	xfree can handle null values, so we don't need to check it.
2024-01-19	Make stack bounds detection work with ASAN	KJ Tsanaktsidis
	Where a local variable is used as part of the stack bounds detection, it has to actually be on the stack. ASAN can put local variable on "fake stacks", however, with addresses in different memory mappings. This completely destroys the stack bounds calculation, and can lead to e.g. things not getting GC marked on the machine stack or stackoverflow checks that always fail. The __asan_addr_is_in_fake_stack helper can be used to get the _real_ stack address of such variables, and thus perform the stack size calculation properly [Bug #20001]
2024-01-19	Pass down "stack start" variables from closer to the top of the stack	KJ Tsanaktsidis
	This commit changes how stack extents are calculated for both the main thread and other threads. Ruby uses the address of a local variable as part of the calculation for machine stack extents: * pthreads uses it as a lower-bound on the start of the stack, because glibc (and maybe other libcs) can store its own data on the stack before calling into user code on thread creation. * win32 uses it as an argument to VirtualQuery, which gets the extent of the memory mapping which contains the variable However, the local being used for this is actually too low (too close to the leaf function call) in both the main thread case and the new thread case. In the main thread case, we have the `INIT_STACK` macro, which is used for pthreads to set the `native_main_thread->stack_start` value. This value is correctly captured at the very top level of the program (in main.c). However, this is _not_ what's used to set the execution context machine stack (`th->ec->machine_stack.stack_start`); that gets set as part of a call to `ruby_thread_init_stack` in `Init_BareVM`, using the address of a local variable allocated _inside_ `Init_BareVM`. This is too low; we need to use a local allocated closer to the top of the program. In the new thread case, the lolcal is allocated inside `native_thread_init_stack`, which is, again, too low. In both cases, this means that we might have VALUEs lying outside the bounds of `th->ec->machine.stack_{start,end}`, which won't be marked correctly by the GC machinery. To fix this, * In the main thread case: We already have `INIT_STACK` at the right level, so just pass that local var to `ruby_thread_init_stack`. * In the new thread case: Allocate the local one level above the call to `native_thread_init_stack` in `call_thread_start_func2`. [Bug #20001] fix
2024-01-12	Revert "Pass down "stack start" variables from closer to the top of the stack"	KJ Tsanaktsidis
	This reverts commit 4ba8f0dc993953d3ddda6328e3ef17a2fc2cbde5.
2024-01-12	Revert "Make stack bounds detection work with ASAN"	KJ Tsanaktsidis
	This reverts commit 6185cfdf38e26026c6d38220eeca48689e54cdcf.
2024-01-12	Make stack bounds detection work with ASAN	KJ Tsanaktsidis
	Where a local variable is used as part of the stack bounds detection, it has to actually be on the stack. ASAN can put local variable on "fake stacks", however, with addresses in different memory mappings. This completely destroys the stack bounds calculation, and can lead to e.g. things not getting GC marked on the machine stack or stackoverflow checks that always fail. The __asan_addr_is_in_fake_stack helper can be used to get the _real_ stack address of such variables, and thus perform the stack size calculation properly [Bug #20001]
2024-01-12	Pass down "stack start" variables from closer to the top of the stack	KJ Tsanaktsidis
	The implementation of `native_thread_init_stack` for the various threading models can use the address of a local variable as part of the calculation of the machine stack extents: * pthreads uses it as a lower-bound on the start of the stack, because glibc (and maybe other libcs) can store its own data on the stack before calling into user code on thread creation. * win32 uses it as an argument to VirtualQuery, which gets the extent of the memory mapping which contains the variable However, the local being used for this is actually allocated _inside_ the `native_thread_init_stack` frame; that means the caller might allocate a VALUE on the stack that actually lies outside the bounds stored in machine.stack_{start,end}. A local variable from one level above the topmost frame that stores VALUEs on the stack must be drilled down into the call to `native_thread_init_stack` to be used in the calculation. This probably doesn't _really_ matter for the win32 case (they'll be in the same memory mapping so VirtualQuery should return the same thing), but definitely could matter for the pthreads case. [Bug #20001]
2024-01-02	Use max_cpu when RUBY_MAX_CPU given	Shia

2023-12-24	Use native_thread_init_stack on cygwin	Daisuke Fujimura (fd0)

2023-12-23	MN: ceil timeout milli seconds	Koichi Sasada
	`hrrel / RB_HRTIME_PER_MSEC` floor the timeout value and it can return wrong value by `Mutex#sleep` (return Integer even if it should return nil (timeout'ed)). This patch ceil the value and the issue was solved.
2023-12-20	KQueue support for M:N threads	JP Camara
	* Allows macOS users to use M:N threads (and technically FreeBSD, though it has not been verified on FreeBSD) * Include sys/event.h header check for macros, and include sys/event.h when present * Rename epoll_fd to more generic kq_fd (Kernel event Queue) for use by both epoll and kqueue * MAP_STACK is not available on macOS so conditionall apply it to mmap flags * Set fd to close on exec * Log debug messages specific to kqueue and epoll on creation * close_invalidate raises an error for the kqueue fd on child process fork. It's unclear rn if that's a bug, or if it's kqueue specific behavior Use kq with rb_thread_wait_for_single_fd * Only platforms with `USE_POLL` (linux) had changes applied to take advantage of kernel event queues. It needed to be applied to the `select` so that kqueue could be properly applied * Clean up kqueue specific code and make sure only flags that were actually set are removed (or an error is raised) * Also handle kevent specific errnos, since most don't apply from epoll to kqueue * Use the more platform standard close-on-exec approach of `fcntl` and `FD_CLOEXEC`. The io-event gem uses `ioctl`, but fcntl seems to be the recommended choice. It is also what Go, Bun, and Libuv use * We're making changes in this file anyways - may as well fix a couple spelling mistakes while here Make sure FD_CLOEXEC carries over in dup * Otherwise the kqueue descriptor should have FD_CLOEXEC, but doesn't and fails in assert_close_on_exec
2023-12-19	clear `sched->lock_onwer` at fork	Koichi Sasada
	`sched->lock_owner` can be non-NULL at fork because the timer thread can acquire the lock while forking. `lock_owner` information is for debugging, so we only need to clear it at fork. I hope this patch fixes the following assertion failure: ``` thread_pthread.c:354:thread_sched_lock_:sched->lock_owner == NULL ```