Age | Commit message (Collapse) | Author |
|
This caused Bus error on 32 bit Solaris
Notes:
Merged: https://github.com/ruby/ruby/pull/5049
|
|
Notes:
Merged: https://github.com/ruby/ruby/pull/4771
|
|
In an effort to simplify the logic YJIT generates for accessing instance
variable, YJIT ensures that a given name-to-index mapping exists at
compile time. In the case that the mapping doesn't exist, it was created
by using rb_ivar_set() with Qundef on the sample object we see at
compile time. This hack isn't fine if the sample object happens to be
frozen, in which case YJIT would raise a FrozenError unexpectedly.
To deal with this, make a new function that only reserves the mapping
but doesn't touch the object. This is rb_obj_ensure_iv_index_mapping().
This new function superceeds the functionality of rb_iv_index_tbl_lookup()
so it was removed.
Reported by and includes a test case from John Hawthorn <john@hawthorn.email>
Fixes: GH-282
|
|
When YJIT make calls to routines without reconstructing interpreter
state through jit_prepare_routine_call(), it relies on the routine to
never allocate, raise, and push/pop control frames. Comment about this
on the routines that YJTI calls.
This is probably something we should dynamically verify on debug builds.
It's hard to statically verify this as it requires verifying all
functions in the call tree. Maybe something to look at in the future.
|
|
I did a `git diff --stat` against upstream and looked at all the files
that are outside of YJIT to come up with these minor changes.
|
|
For upstreaming, we want functions we export either prefixed with "rb_"
or made static. Historically we haven't been following this rule, so we
were "leaking" a lot of symbols as `make leak-globals` would tell us.
This change unifies everything YJIT into a single compilation unit,
yjit.o, and makes everything unprefixed static to pass `make leak-globals`.
This manual "unified build" setup is similar to that of vm.o.
Having everything in one compilation unit allows static functions to
be visible across YJIT files and removes the need for declarations in
headers in some cases. Unnecessary declarations were removed.
Other changes of note:
- switched to MJIT_SYMBOL_EXPORT_BEGIN which indicates stuff as being
off limits for native extensions
- the first include of each YJIT file is change to be "internal.h"
- undefined MAP_STACK before explicitly redefining it since it
collide's with a definition in system headers. Consider renaming?
|
|
Co-authored-by: Aaron Patterson <tenderlove@ruby-lang.org>
|
|
Before this change, when we encounter a constant cache that is specific
to a lexical scope, we unconditionally exit. This change falls back to
the interpreter's cache in this situation.
This should help constant expressions in `class << self`, which is popular
at Shopify due to the style guide.
This change relies on the cache being warm while compiling to detect the
need for checking the lexical scope for simplicity.
|
|
|
|
|
|
This new function will do the write barrier / resize the object / check
frozen for us
|
|
|
|
|
|
|
|
|
|
* Implement setivar with a plain old function call
* Remove return
|
|
|
|
|
|
|
|
* Implement send with alias method
* Add alias_method tests
|
|
* Implement calls to methods with simple optional params
* Remove unnecessary MJIT_STATIC
See comment for MJIT_STATIC. I added it not knowing whether it's
required because the function next to it has it. Don't use it and wait
for problems to come up instead.
* Better naming, some comments
* Count bailing on kw only iseqs
On railsbench:
```
opt_send_without_block exit reasons:
bmethod 59729 (27.7%)
optimized_method 59137 (27.5%)
iseq_complex_callee 41362 (19.2%)
alias_method 33346 (15.5%)
callsite_not_simple 19170 ( 8.9%)
iseq_only_keywords 1300 ( 0.6%)
kw_splat 1299 ( 0.6%)
cfunc_ruby_array_varg 18 ( 0.0%)
```
|
|
Make sure `opt_getinlinecache` is in a block all on its own, and
invalidate it from the interpreter when `opt_setinlinecache`.
It will recompile with a filled cache the second time around.
This lets YJIT runs well when the IC for constant is cold.
|
|
Lazily compile out a chain of checks for different known classes and
whether `self` embeds its ivars or not.
* Remove trailing whitespaces
* Get proper addresss in Capstone disassembly
* Lowercase address in Capstone disassembly
Capstone uses lowercase for jump targets in generated listings. Let's
match it.
* Use the same successor in getivar guard chains
Cuts down on duplication
* Address reviews
* Fix copypasta error
* Add a comment
|
|
|
|
|
|
|
|
Notes:
Merged: https://github.com/ruby/ruby/pull/4925
Merged-By: nobu <nobu@ruby-lang.org>
|
|
Notes:
Merged: https://github.com/ruby/ruby/pull/4919
|
|
Pass in ec to vm_opt_newarray_{max,min}. Avoids having to
call GET_EC inside the functions, for better performance.
While here, add a test for Array#min/max being redefined to
test_optimization.rb.
Fixes [Bug #18180]
Notes:
Merged: https://github.com/ruby/ruby/pull/4911
Merged-By: jeremyevans <code@jeremyevans.net>
|
|
|
|
Linking printf family functions makes mjit objects to link
unnecessary code.
Notes:
Merged: https://github.com/ruby/ruby/pull/4820
|
|
In vm_call_method_each_type, check for c_call and c_return events before
dispatching to vm_call_ivar and vm_call_attrset. With this approach, the
call cache will still dispatch directly to those functions, so this
change will only decrease performance for the first (uncached) call, and
even then, the performance decrease is very minimal.
This approach requires that we clear the call caches when tracing is
enabled or disabled. The approach currently switches all vm_call_ivar
and vm_call_attrset call caches to vm_call_general any time tracing is
enabled or disabled. So it could theoretically result in a slowdown for
code that constantly enables or disables tracing.
This approach does not handle targeted tracepoints, but from my testing,
c_call and c_return events are not supported for targeted tracepoints,
so that shouldn't matter.
This includes a benchmark showing the performance decrease is minimal
if detectable at all.
Fixes [Bug #16383]
Fixes [Bug #10470]
Co-authored-by: Takashi Kokubun <takashikkbn@gmail.com>
Notes:
Merged: https://github.com/ruby/ruby/pull/4767
|
|
Notes:
Merged: https://github.com/ruby/ruby/pull/4716
|
|
Notes:
Merged: https://github.com/ruby/ruby/pull/4695
Merged-By: nobu <nobu@ruby-lang.org>
|
|
Extracted repeated code as update_classvariable_cache. When cvc
table is not set in getclassvariable, an empty table was created
but it has no id and would cause [BUG], so made the code same as
setclassvariable.
|
|
Redo of 34a2acdac788602c14bf05fb616215187badd504 and
931138b00696419945dc03e10f033b1f53cd50f3 which were reverted.
GitHub PR #4340.
This change implements a cache for class variables. Previously there was
no cache for cvars. Cvar access is slow due to needing to travel all the
way up th ancestor tree before returning the cvar value. The deeper the
ancestor tree the slower cvar access will be.
The benefits of the cache are more visible with a higher number of
included modules due to the way Ruby looks up class variables. The
benchmark here includes 26 modules and shows with the cache, this branch
is 6.5x faster when accessing class variables.
```
compare-ruby: ruby 3.1.0dev (2021-03-15T06:22:34Z master 9e5105c) [x86_64-darwin19]
built-ruby: ruby 3.1.0dev (2021-03-15T12:12:44Z add-cache-for-clas.. c6be009) [x86_64-darwin19]
| |compare-ruby|built-ruby|
|:--------|-----------:|---------:|
|vm_cvar | 5.681M| 36.980M|
| | -| 6.51x|
```
Benchmark.ips calling `ActiveRecord::Base.logger` from within a Rails
application. ActiveRecord::Base.logger has 71 ancestors. The more
ancestors a tree has, the more clear the speed increase. IE if Base had
only one ancestor we'd see no improvement. This benchmark is run on a
vanilla Rails application.
Benchmark code:
```ruby
require "benchmark/ips"
require_relative "config/environment"
Benchmark.ips do |x|
x.report "logger" do
ActiveRecord::Base.logger
end
end
```
Ruby 3.0 master / Rails 6.1:
```
Warming up --------------------------------------
logger 155.251k i/100ms
Calculating -------------------------------------
```
Ruby 3.0 with cvar cache / Rails 6.1:
```
Warming up --------------------------------------
logger 1.546M i/100ms
Calculating -------------------------------------
logger 14.857M (± 4.8%) i/s - 74.198M in 5.006202s
```
Lastly we ran a benchmark to demonstate the difference between master
and our cache when the number of modules increases. This benchmark
measures 1 ancestor, 30 ancestors, and 100 ancestors.
Ruby 3.0 master:
```
Warming up --------------------------------------
1 module 1.231M i/100ms
30 modules 432.020k i/100ms
100 modules 145.399k i/100ms
Calculating -------------------------------------
1 module 12.210M (± 2.1%) i/s - 61.553M in 5.043400s
30 modules 4.354M (± 2.7%) i/s - 22.033M in 5.063839s
100 modules 1.434M (± 2.9%) i/s - 7.270M in 5.072531s
Comparison:
1 module: 12209958.3 i/s
30 modules: 4354217.8 i/s - 2.80x (± 0.00) slower
100 modules: 1434447.3 i/s - 8.51x (± 0.00) slower
```
Ruby 3.0 with cvar cache:
```
Warming up --------------------------------------
1 module 1.641M i/100ms
30 modules 1.655M i/100ms
100 modules 1.620M i/100ms
Calculating -------------------------------------
1 module 16.279M (± 3.8%) i/s - 82.038M in 5.046923s
30 modules 15.891M (± 3.9%) i/s - 79.459M in 5.007958s
100 modules 16.087M (± 3.6%) i/s - 81.005M in 5.041931s
Comparison:
1 module: 16279458.0 i/s
100 modules: 16087484.6 i/s - same-ish: difference falls within error
30 modules: 15891406.2 i/s - same-ish: difference falls within error
```
Co-authored-by: Aaron Patterson <tenderlove@ruby-lang.org>
Notes:
Merged: https://github.com/ruby/ruby/pull/4544
|
|
This reverts commit 08de37f9fa3469365e6b5c964689ae2bae0eb9f3.
This reverts commit e8ae922b62adb00a80d3d4c49f7d7b0e6026eaba.
|
|
Instead of on read. Once it's in the inline cache we never have to make
one again. We want to eventually put the value into the cache, and the
best opportunity to do that is when you write the value.
Notes:
Merged: https://github.com/ruby/ruby/pull/4340
|
|
This change implements a cache for class variables. Previously there was
no cache for cvars. Cvar access is slow due to needing to travel all the
way up th ancestor tree before returning the cvar value. The deeper the
ancestor tree the slower cvar access will be.
The benefits of the cache are more visible with a higher number of
included modules due to the way Ruby looks up class variables. The
benchmark here includes 26 modules and shows with the cache, this branch
is 6.5x faster when accessing class variables.
```
compare-ruby: ruby 3.1.0dev (2021-03-15T06:22:34Z master 9e5105ca45) [x86_64-darwin19]
built-ruby: ruby 3.1.0dev (2021-03-15T12:12:44Z add-cache-for-clas.. c6be0093ae) [x86_64-darwin19]
| |compare-ruby|built-ruby|
|:--------|-----------:|---------:|
|vm_cvar | 5.681M| 36.980M|
| | -| 6.51x|
```
Benchmark.ips calling `ActiveRecord::Base.logger` from within a Rails
application. ActiveRecord::Base.logger has 71 ancestors. The more
ancestors a tree has, the more clear the speed increase. IE if Base had
only one ancestor we'd see no improvement. This benchmark is run on a
vanilla Rails application.
Benchmark code:
```ruby
require "benchmark/ips"
require_relative "config/environment"
Benchmark.ips do |x|
x.report "logger" do
ActiveRecord::Base.logger
end
end
```
Ruby 3.0 master / Rails 6.1:
```
Warming up --------------------------------------
logger 155.251k i/100ms
Calculating -------------------------------------
```
Ruby 3.0 with cvar cache / Rails 6.1:
```
Warming up --------------------------------------
logger 1.546M i/100ms
Calculating -------------------------------------
logger 14.857M (± 4.8%) i/s - 74.198M in 5.006202s
```
Lastly we ran a benchmark to demonstate the difference between master
and our cache when the number of modules increases. This benchmark
measures 1 ancestor, 30 ancestors, and 100 ancestors.
Ruby 3.0 master:
```
Warming up --------------------------------------
1 module 1.231M i/100ms
30 modules 432.020k i/100ms
100 modules 145.399k i/100ms
Calculating -------------------------------------
1 module 12.210M (± 2.1%) i/s - 61.553M in 5.043400s
30 modules 4.354M (± 2.7%) i/s - 22.033M in 5.063839s
100 modules 1.434M (± 2.9%) i/s - 7.270M in 5.072531s
Comparison:
1 module: 12209958.3 i/s
30 modules: 4354217.8 i/s - 2.80x (± 0.00) slower
100 modules: 1434447.3 i/s - 8.51x (± 0.00) slower
```
Ruby 3.0 with cvar cache:
```
Warming up --------------------------------------
1 module 1.641M i/100ms
30 modules 1.655M i/100ms
100 modules 1.620M i/100ms
Calculating -------------------------------------
1 module 16.279M (± 3.8%) i/s - 82.038M in 5.046923s
30 modules 15.891M (± 3.9%) i/s - 79.459M in 5.007958s
100 modules 16.087M (± 3.6%) i/s - 81.005M in 5.041931s
Comparison:
1 module: 16279458.0 i/s
100 modules: 16087484.6 i/s - same-ish: difference falls within error
30 modules: 15891406.2 i/s - same-ish: difference falls within error
```
Co-authored-by: Aaron Patterson <tenderlove@ruby-lang.org>
Notes:
Merged: https://github.com/ruby/ruby/pull/4340
|
|
|
|
This reverts commit a224ce8150f2bc687cf79eb415c931d87a4cd247.
Turns out the checks are needed to handle splatting an array with an
empty ruby2 keywords hash.
Notes:
Merged-By: XrXr
|
|
In Ruby 2.7, empty keyword splats could be added back for backwards
compatibility. However, that stopped in Ruby 3.0.
|
|
These two checks are surrounded by an if that ensures the
call site is not a kw splat call site.
Notes:
Merged: https://github.com/ruby/ruby/pull/4404
Merged-By: XrXr
|
|
A "return" statement in a Proc in a lambda like:
`lambda{ proc{ return }.call }`
should return outer lambda block. However, the inner Proc can become
orphan Proc from the lambda block. This "return" escape outer-scope
like method, but this behavior was decieded as a bug.
[Bug #17105]
This patch raises LocalJumpError by checking the proc is orphan or
not from lambda blocks before escaping by "return".
Most of tests are written by Jeremy Evans
https://github.com/ruby/ruby/pull/4223
Notes:
Merged: https://github.com/ruby/ruby/pull/4347
|
|
Notes:
Merged: https://github.com/ruby/ruby/pull/4279
|
|
We just need this function to return whether or not the thing we're
looking for is defined. If it's defined, return something true,
otherwise false.
Notes:
Merged: https://github.com/ruby/ruby/pull/4279
|
|
We already have access to the string from the iseqs, so we can stop
calling this function.
Notes:
Merged: https://github.com/ruby/ruby/pull/4279
|
|
This version of defined? doesn't seem to be possible to emit anymore.
Notes:
Merged: https://github.com/ruby/ruby/pull/4253
|
|
This patch improves the performance of sequential and parallel
execution of rb_equal() (and rb_eql()).
[Bug #17497]
rb_equal_opt (and rb_eql_opt) does not have own cd and it waste
a time to initialize cd. This patch introduces opt_equality_by_mid()
to check equality without cd.
Furthermore, current master uses "static" cd on rb_equal_opt
(and rb_eql_opt) and it hurts CPU caches on multi-thread execution.
Now they are gone so there are no bottleneck on parallel execution.
Notes:
Merged: https://github.com/ruby/ruby/pull/4177
|
|
rb_funcall* (rb_funcall(), rb_funcallv(), ...) functions invokes
Ruby's method with given receiver. Ruby 2.7 introduced inline method
cache with static memory area. However, Ruby 3.0 reimplemented the
method cache data structures and the inline cache was removed.
Without inline cache, rb_funcall* searched methods everytime.
Most of cases per-Class Method Cache (pCMC) will be helped but
pCMC requires VM-wide locking and it hurts performance on
multi-Ractor execution, especially all Ractors calls methods
with rb_funcall*.
This patch introduced Global Call-Cache Cache Table (gccct) for
rb_funcall*. Call-Cache was introduced from Ruby 3.0 to manage
method cache entry atomically and gccct enables method-caching
without VM-wide locking. This table solves the performance issue
on multi-ractor execution.
[Bug #17497]
Ruby-level method invocation does not use gccct because it has
inline-method-cache and the table size is limited. Basically
rb_funcall* is not used frequently, so 1023 entries can be enough.
We will revisit the table size if it is not enough.
Notes:
Merged: https://github.com/ruby/ruby/pull/4129
|