diff options
| author | Luke Gruber <luke.gruber@shopify.com> | 2025-06-23 14:33:52 -0400 |
|---|---|---|
| committer | John Hawthorn <john@hawthorn.email> | 2025-08-08 13:37:31 -0700 |
| commit | 07878ebe787843f510be460738ff02dd883bf9ad (patch) | |
| tree | 8eec4e32addd1ceb9ddc83eb3d254f7451a755f7 /include/ruby/internal/encoding/sprintf.h | |
| parent | e639e5fd1af51e2462879d6db862ee5320914ba7 (diff) | |
Fix lock ordering issue for rb_ractor_sched_wait() and rb_ractor_sched_wakeup()
In rb_ractor_sched_wait() (ex: Ractor.receive), we acquire
RACTOR_LOCK(cr) and then thread_sched_lock(cur_th). However, on wakeup
if we're a dnt, in thread_sched_wait_running_turn() we acquire
thread_sched_lock(cur_th) after condvar wakeup and then RACTOR_LOCK(cr).
This lock inversion can cause a deadlock with rb_ractor_wakeup_all()
(ex: port.send(obj)), where we acquire RACTOR_LOCK(other_r) and then
thread_sched_lock(other_th).
So, the error happens:
nt 1: Ractor.receive
rb_ractor_sched_wait() after condvar wakeup in thread_sched_wait_running_turn():
- thread_sched_lock(cur_th) (condvar) # acquires lock
- rb_ractor_lock_self(cr) # deadlock here: tries to acquire, HANGS
nt 2: port.send
ractor_wakeup_all()
- RACTOR_LOCK(port_r) # acquires lock
- thread_sched_lock # tries to acquire, HANGS
To fix it, we now unlock the thread_sched_lock before acquiring the
ractor_lock in rb_ractor_sched_wait().
Script that reproduces issue:
```ruby
require "async"
class RactorWrapper
def initialize
@ractor = Ractor.new do
Ractor.recv # Ractor doesn't start until explicitly told to
# Do some calculations
fib = ->(x) { x < 2 ? 1 : fib.call(x - 1) + fib.call(x - 2) }
fib.call(20)
end
end
def take_async
@ractor.send(nil)
Thread.new { @ractor.value }.value
end
end
Async do |task|
10_000.times do |i|
task.async do
RactorWrapper.new.take_async
puts i
end
end
end
exit 0
```
Fixes [Bug #21398]
Co-authored-by: John Hawthorn <john.hawthorn@shopify.com>
Diffstat (limited to 'include/ruby/internal/encoding/sprintf.h')
0 files changed, 0 insertions, 0 deletions
