summaryrefslogtreecommitdiff
path: root/vm_insnhelper.c
AgeCommit message (Collapse)Author
2021-11-21Fix setting struct member by public_sendNobuyoshi Nakada
Notes: Merged: https://github.com/ruby/ruby/pull/5152
2021-11-19optimize `Struct` getter/setterKoichi Sasada
Introduce new optimized method type `OPTIMIZED_METHOD_TYPE_STRUCT_AREF/ASET` with index information. Notes: Merged: https://github.com/ruby/ruby/pull/5131
2021-11-19`rb_method_optimized_t` for further extensionKoichi Sasada
Now `rb_method_optimized_t optimized` field is added to represent optimized method type. Notes: Merged: https://github.com/ruby/ruby/pull/5131
2021-11-18Optimize dynamic string interpolation for symbol/true/false/nil/0-9Jeremy Evans
This provides a significant speedup for symbol, true, false, nil, and 0-9, class/module, and a small speedup in most other cases. Speedups (using included benchmarks): :symbol :: 60% 0-9 :: 50% Class/Module :: 50% nil/true/false :: 20% integer :: 10% [] :: 10% "" :: 3% One reason this approach is faster is it reduces the number of VM instructions for each interpolated value. Initial idea, approach, and benchmarks from Eric Wong. I applied the same approach against the master branch, updating it to handle the significant internal changes since this was first proposed 4 years ago (such as CALL_INFO/CALL_CACHE -> CALL_DATA). I also expanded it to optimize true/false/nil/0-9/class/module, and added handling of missing methods, refined methods, and RUBY_DEBUG. This renames the tostring insn to anytostring, and adds an objtostring insn that implements the optimization. This requires making a few functions non-static, and adding some non-static functions. This disables 4 YJIT tests. Those tests should be reenabled after YJIT optimizes the new objtostring insn. Implements [Feature #13715] Co-authored-by: Eric Wong <e@80x24.org> Co-authored-by: Alan Wu <XrXr@users.noreply.github.com> Co-authored-by: Yusuke Endoh <mame@ruby-lang.org> Co-authored-by: Koichi Sasada <ko1@atdot.net> Notes: Merged: https://github.com/ruby/ruby/pull/5002 Merged-By: jeremyevans <code@jeremyevans.net>
2021-11-18Refactor setclassvariable (#5143)Eileen M. Uchitelle
We only need the cref when we have a cache miss so don't look it up until we need it. This likely speeds up class variable writes in the interpreter but also simplifies the jit code. Before ``` Warming up -------------------------------------- write a cvar 192.280k i/100ms Calculating ------------------------------------- write a cvar 1.915M (± 3.5%) i/s - 9.614M in 5.026694s ``` After ``` Warming up -------------------------------------- write a cvar 216.308k i/100ms Calculating ------------------------------------- write a cvar 2.140M (± 3.1%) i/s - 10.815M in 5.058079s ``` Followup to ruby/ruby#5137 Notes: Merged-By: maximecb <maximecb@ruby-lang.org>
2021-11-18Refactor getclassvariable (#5137)Eileen M. Uchitelle
* Refactor getclassvariable We only need the cref when we have a cache miss so don't look it up until we need it. This speeds up class variable reads in the interpreter but also simplifies the jit code. Benchmarks for master vs this branch (without yjit): Before: ``` Warming up -------------------------------------- read a cvar 1.276M i/100ms Calculating ------------------------------------- read a cvar 12.596M (± 1.7%) i/s - 63.781M in 5.064902s ``` After: ``` Warming up -------------------------------------- read a cvar 1.336M i/100ms Calculating ------------------------------------- read a cvar 13.114M (± 3.6%) i/s - 65.488M in 5.000584s ``` Co-authored-by: Aaron Patterson <tenderlove@ruby-lang.org> * Clean up function signatures / remove dead code rb_vm_getclassvariable signature has changed and we don't need rb_vm_get_cref. Co-authored-by: Aaron Patterson <tenderlove@ruby-lang.org> Notes: Merged-By: maximecb <maximecb@ruby-lang.org>
2021-11-17no need to check `cme == NULL`Koichi Sasada
Now `cc->cme_` is not NULL. Notes: Merged: https://github.com/ruby/ruby/pull/5122
2021-11-17`vm_empty_cc_for_super`Koichi Sasada
Same as `vm_empty_cc`, introduce a global variable which has `.call_ = vm_call_super_method`. Use it if the `cme == NULL` on `vm_search_super_method`. Notes: Merged: https://github.com/ruby/ruby/pull/5122
2021-11-17a variable is not needed.Koichi Sasada
Notes: Merged: https://github.com/ruby/ruby/pull/5122
2021-11-17`Primitive.mandatory_only?` consider splat argsJean Boussier
`vm_ci_argc` gives the number of arguments, but `*[1, 2, 3]` only counts for one. Notes: Merged: https://github.com/ruby/ruby/pull/5124
2021-11-15`Primitive.mandatory_only?` for fast pathKoichi Sasada
Compare with the C methods, A built-in methods written in Ruby is slower if only mandatory parameters are given because it needs to check the argumens and fill default values for optional and keyword parameters (C methods can check the number of parameters with `argc`, so there are no overhead). Passing mandatory arguments are common (optional arguments are exceptional, in many cases) so it is important to provide the fast path for such common cases. `Primitive.mandatory_only?` is a special builtin function used with `if` expression like that: ```ruby def self.at(time, subsec = false, unit = :microsecond, in: nil) if Primitive.mandatory_only? Primitive.time_s_at1(time) else Primitive.time_s_at(time, subsec, unit, Primitive.arg!(:in)) end end ``` and it makes two ISeq, ``` def self.at(time, subsec = false, unit = :microsecond, in: nil) Primitive.time_s_at(time, subsec, unit, Primitive.arg!(:in)) end def self.at(time) Primitive.time_s_at1(time) end ``` and (2) is pointed by (1). Note that `Primitive.mandatory_only?` should be used only in a condition of an `if` statement and the `if` statement should be equal to the methdo body (you can not put any expression before and after the `if` statement). A method entry with `mandatory_only?` (`Time.at` on the above case) is marked as `iseq_overload`. When the method will be dispatch only with mandatory arguments (`Time.at(0)` for example), make another method entry with ISeq (2) as mandatory only method entry and it will be cached in an inline method cache. The idea is similar discussed in https://bugs.ruby-lang.org/issues/16254 but it only checks mandatory parameters or more, because many cases only mandatory parameters are given. If we find other cases (optional or keyword parameters are used frequently and it hurts performance), we can extend the feature. Notes: Merged: https://github.com/ruby/ruby/pull/5112
2021-11-11[Bug #18329] Fix crash when calling non-existent super methodPeter Zhu
The cme is NULL when a method does not exist, so check it before accessing the callcache. Notes: Merged: https://github.com/ruby/ruby/pull/5108
2021-10-29vm_core.h: Avoid unaligned access to ic_serial on 32-bit machineYusuke Endoh
This caused Bus error on 32 bit Solaris Notes: Merged: https://github.com/ruby/ruby/pull/5049
2021-10-27the core problem is the Proc is not shareableSatoshi Moris Tagomori
Notes: Merged: https://github.com/ruby/ruby/pull/4771
2021-10-20Extract yjit_force_iv_index and make it work when object is frozenAlan Wu
In an effort to simplify the logic YJIT generates for accessing instance variable, YJIT ensures that a given name-to-index mapping exists at compile time. In the case that the mapping doesn't exist, it was created by using rb_ivar_set() with Qundef on the sample object we see at compile time. This hack isn't fine if the sample object happens to be frozen, in which case YJIT would raise a FrozenError unexpectedly. To deal with this, make a new function that only reserves the mapping but doesn't touch the object. This is rb_obj_ensure_iv_index_mapping(). This new function superceeds the functionality of rb_iv_index_tbl_lookup() so it was removed. Reported by and includes a test case from John Hawthorn <john@hawthorn.email> Fixes: GH-282
2021-10-20Add comments about special runtime routines YJIT callsAlan Wu
When YJIT make calls to routines without reconstructing interpreter state through jit_prepare_routine_call(), it relies on the routine to never allocate, raise, and push/pop control frames. Comment about this on the routines that YJTI calls. This is probably something we should dynamically verify on debug builds. It's hard to statically verify this as it requires verifying all functions in the call tree. Maybe something to look at in the future.
2021-10-20Cleanup diff against upstream. Add commentsAlan Wu
I did a `git diff --stat` against upstream and looked at all the files that are outside of YJIT to come up with these minor changes.
2021-10-20Put YJIT into a single compilation unitAlan Wu
For upstreaming, we want functions we export either prefixed with "rb_" or made static. Historically we haven't been following this rule, so we were "leaking" a lot of symbols as `make leak-globals` would tell us. This change unifies everything YJIT into a single compilation unit, yjit.o, and makes everything unprefixed static to pass `make leak-globals`. This manual "unified build" setup is similar to that of vm.o. Having everything in one compilation unit allows static functions to be visible across YJIT files and removes the need for declarations in headers in some cases. Unnecessary declarations were removed. Other changes of note: - switched to MJIT_SYMBOL_EXPORT_BEGIN which indicates stuff as being off limits for native extensions - the first include of each YJIT file is change to be "internal.h" - undefined MAP_STACK before explicitly redefining it since it collide's with a definition in system headers. Consider renaming?
2021-10-20Implement getclassvariable in yjiteileencodes
Co-authored-by: Aaron Patterson <tenderlove@ruby-lang.org>
2021-10-20Add a slowpath for opt_getinlinecacheAlan Wu
Before this change, when we encounter a constant cache that is specific to a lexical scope, we unconditionally exit. This change falls back to the interpreter's cache in this situation. This should help constant expressions in `class << self`, which is popular at Shopify due to the style guide. This change relies on the cache being warm while compiling to detect the need for checking the lexical scope for simplicity.
2021-10-20Remove vm_opt_asetJohn Hawthorn
2021-10-20Add comments for new functionAaron Patterson
2021-10-20Refactor attrset to use a functionAaron Patterson
This new function will do the write barrier / resize the object / check frozen for us
2021-10-20Remove rb_opt_equality_specializedJohn Hawthorn
2021-10-20Implement splatarrayKevin Newton
2021-10-20Try to fix MJIT symbol clash with cargo cultMaxime Chevalier-Boisvert
2021-10-20Implement defined bytecode (#39)Maxime Chevalier-Boisvert
2021-10-20Implement setivar with a plain old function call (#34)Maxime Chevalier-Boisvert
* Implement setivar with a plain old function call * Remove return
2021-10-20Implement opt_aset as interpreter handler callMaxime Chevalier-Boisvert
2021-10-20Implement opt_mod as call to interpreter function (#29)Maxime Chevalier-Boisvert
2021-10-20Implement opt_eq by calling interpreter function (#28)Maxime Chevalier-Boisvert
2021-10-20Implement send with alias method (#23)Maxime Chevalier-Boisvert
* Implement send with alias method * Add alias_method tests
2021-10-20Implement calls to methods with simple optional paramsAlan Wu
* Implement calls to methods with simple optional params * Remove unnecessary MJIT_STATIC See comment for MJIT_STATIC. I added it not knowing whether it's required because the function next to it has it. Don't use it and wait for problems to come up instead. * Better naming, some comments * Count bailing on kw only iseqs On railsbench: ``` opt_send_without_block exit reasons: bmethod 59729 (27.7%) optimized_method 59137 (27.5%) iseq_complex_callee 41362 (19.2%) alias_method 33346 (15.5%) callsite_not_simple 19170 ( 8.9%) iseq_only_keywords 1300 ( 0.6%) kw_splat 1299 ( 0.6%) cfunc_ruby_array_varg 18 ( 0.0%) ```
2021-10-20YJIT: Fancier opt_getinlinecacheAlan Wu
Make sure `opt_getinlinecache` is in a block all on its own, and invalidate it from the interpreter when `opt_setinlinecache`. It will recompile with a filled cache the second time around. This lets YJIT runs well when the IC for constant is cold.
2021-10-20YJIT: lazy polymorphic getinstancevariableAlan Wu
Lazily compile out a chain of checks for different known classes and whether `self` embeds its ivars or not. * Remove trailing whitespaces * Get proper addresss in Capstone disassembly * Lowercase address in Capstone disassembly Capstone uses lowercase for jump targets in generated listings. Let's match it. * Use the same successor in getivar guard chains Cuts down on duplication * Address reviews * Fix copypasta error * Add a comment
2021-10-20WIP JIT-to-JIT returnsMaxime Chevalier-Boisvert
2021-10-19Remove useless castsNobuyoshi Nakada
2021-10-19Get rid of type-punning castNobuyoshi Nakada
2021-10-03Using NIL_P macro instead of `== Qnil`S.H
Notes: Merged: https://github.com/ruby/ruby/pull/4925 Merged-By: nobu <nobu@ruby-lang.org>
2021-10-01Introduce rb_vm_call_with_refinements to DRY up a few callsJeremy Evans
Notes: Merged: https://github.com/ruby/ruby/pull/4919
2021-09-30Make Array#min/max optimization respect refined methodsJeremy Evans
Pass in ec to vm_opt_newarray_{max,min}. Avoids having to call GET_EC inside the functions, for better performance. While here, add a test for Array#min/max being redefined to test_optimization.rb. Fixes [Bug #18180] Notes: Merged: https://github.com/ruby/ruby/pull/4911 Merged-By: jeremyevans <code@jeremyevans.net>
2021-09-19Extract hook macro for attributesNobuyoshi Nakada
2021-09-11Remove printf family from the mjit headerNobuyoshi Nakada
Linking printf family functions makes mjit objects to link unnecessary code. Notes: Merged: https://github.com/ruby/ruby/pull/4820
2021-08-29Support tracing of attr_reader and attr_writerJeremy Evans
In vm_call_method_each_type, check for c_call and c_return events before dispatching to vm_call_ivar and vm_call_attrset. With this approach, the call cache will still dispatch directly to those functions, so this change will only decrease performance for the first (uncached) call, and even then, the performance decrease is very minimal. This approach requires that we clear the call caches when tracing is enabled or disabled. The approach currently switches all vm_call_ivar and vm_call_attrset call caches to vm_call_general any time tracing is enabled or disabled. So it could theoretically result in a slowdown for code that constantly enables or disables tracing. This approach does not handle targeted tracepoints, but from my testing, c_call and c_return events are not supported for targeted tracepoints, so that shouldn't matter. This includes a benchmark showing the performance decrease is minimal if detectable at all. Fixes [Bug #16383] Fixes [Bug #10470] Co-authored-by: Takashi Kokubun <takashikkbn@gmail.com> Notes: Merged: https://github.com/ruby/ruby/pull/4767
2021-08-11Get rid of type-punning pointer casts [Bug #18062]Nobuyoshi Nakada
Notes: Merged: https://github.com/ruby/ruby/pull/4716
2021-08-02Using RBOOL macroS.H
Notes: Merged: https://github.com/ruby/ruby/pull/4695 Merged-By: nobu <nobu@ruby-lang.org>
2021-06-23Refactor class variable cache functionsNobuyoshi Nakada
Extracted repeated code as update_classvariable_cache. When cvc table is not set in getclassvariable, an empty table was created but it has no id and would cause [BUG], so made the code same as setclassvariable.
2021-06-18Add a cache for class variableseileencodes
Redo of 34a2acdac788602c14bf05fb616215187badd504 and 931138b00696419945dc03e10f033b1f53cd50f3 which were reverted. GitHub PR #4340. This change implements a cache for class variables. Previously there was no cache for cvars. Cvar access is slow due to needing to travel all the way up th ancestor tree before returning the cvar value. The deeper the ancestor tree the slower cvar access will be. The benefits of the cache are more visible with a higher number of included modules due to the way Ruby looks up class variables. The benchmark here includes 26 modules and shows with the cache, this branch is 6.5x faster when accessing class variables. ``` compare-ruby: ruby 3.1.0dev (2021-03-15T06:22:34Z master 9e5105c) [x86_64-darwin19] built-ruby: ruby 3.1.0dev (2021-03-15T12:12:44Z add-cache-for-clas.. c6be009) [x86_64-darwin19] | |compare-ruby|built-ruby| |:--------|-----------:|---------:| |vm_cvar | 5.681M| 36.980M| | | -| 6.51x| ``` Benchmark.ips calling `ActiveRecord::Base.logger` from within a Rails application. ActiveRecord::Base.logger has 71 ancestors. The more ancestors a tree has, the more clear the speed increase. IE if Base had only one ancestor we'd see no improvement. This benchmark is run on a vanilla Rails application. Benchmark code: ```ruby require "benchmark/ips" require_relative "config/environment" Benchmark.ips do |x| x.report "logger" do ActiveRecord::Base.logger end end ``` Ruby 3.0 master / Rails 6.1: ``` Warming up -------------------------------------- logger 155.251k i/100ms Calculating ------------------------------------- ``` Ruby 3.0 with cvar cache / Rails 6.1: ``` Warming up -------------------------------------- logger 1.546M i/100ms Calculating ------------------------------------- logger 14.857M (± 4.8%) i/s - 74.198M in 5.006202s ``` Lastly we ran a benchmark to demonstate the difference between master and our cache when the number of modules increases. This benchmark measures 1 ancestor, 30 ancestors, and 100 ancestors. Ruby 3.0 master: ``` Warming up -------------------------------------- 1 module 1.231M i/100ms 30 modules 432.020k i/100ms 100 modules 145.399k i/100ms Calculating ------------------------------------- 1 module 12.210M (± 2.1%) i/s - 61.553M in 5.043400s 30 modules 4.354M (± 2.7%) i/s - 22.033M in 5.063839s 100 modules 1.434M (± 2.9%) i/s - 7.270M in 5.072531s Comparison: 1 module: 12209958.3 i/s 30 modules: 4354217.8 i/s - 2.80x (± 0.00) slower 100 modules: 1434447.3 i/s - 8.51x (± 0.00) slower ``` Ruby 3.0 with cvar cache: ``` Warming up -------------------------------------- 1 module 1.641M i/100ms 30 modules 1.655M i/100ms 100 modules 1.620M i/100ms Calculating ------------------------------------- 1 module 16.279M (± 3.8%) i/s - 82.038M in 5.046923s 30 modules 15.891M (± 3.9%) i/s - 79.459M in 5.007958s 100 modules 16.087M (± 3.6%) i/s - 81.005M in 5.041931s Comparison: 1 module: 16279458.0 i/s 100 modules: 16087484.6 i/s - same-ish: difference falls within error 30 modules: 15891406.2 i/s - same-ish: difference falls within error ``` Co-authored-by: Aaron Patterson <tenderlove@ruby-lang.org> Notes: Merged: https://github.com/ruby/ruby/pull/4544
2021-05-11Revert "Filling cache values on cvar write"Aaron Patterson
This reverts commit 08de37f9fa3469365e6b5c964689ae2bae0eb9f3. This reverts commit e8ae922b62adb00a80d3d4c49f7d7b0e6026eaba.
2021-05-11Filling cache values on cvar writeeileencodes
Instead of on read. Once it's in the inline cache we never have to make one again. We want to eventually put the value into the cache, and the best opportunity to do that is when you write the value. Notes: Merged: https://github.com/ruby/ruby/pull/4340