Age | Commit message (Collapse) | Author |
|
This commit adds a debug counter for the case where the inline cache
*missed* but the ivar index table has an entry for that ivar. This is a
case where a polymorphic cache could help
Notes:
Merged: https://github.com/ruby/ruby/pull/3750
|
|
Notes:
Merged: https://github.com/ruby/ruby/pull/3740
|
|
|
|
|
|
To fix build failures.
Notes:
Merged: https://github.com/ruby/ruby/pull/3079
|
|
This shall fix compile errors.
Notes:
Merged: https://github.com/ruby/ruby/pull/3079
|
|
when there's no need to call CALLER_SETUP_ARG and CALLER_REMOVE_EMPTY_KW_SPLAT
(i.e. !rb_splat_or_kwargs_p(ci) && !calling->kw_splat).
Micro benchmark:
```
$ benchmark-driver -v --rbenv 'before;after' benchmark/vm_send_cfunc.yml --repeat-count=4
before: ruby 2.8.0dev (2020-04-13T23:45:05Z master b9d3ceee8f) [x86_64-linux]
after: ruby 2.8.0dev (2020-04-14T00:48:52Z no-splat-fastpath 418d363722) [x86_64-linux]
Calculating -------------------------------------
before after
vm_send_cfunc 69.585M 88.724M i/s - 100.000M times in 1.437097s 1.127096s
Comparison:
vm_send_cfunc
after: 88723605.2 i/s
before: 69584737.1 i/s - 1.28x slower
```
Optcarrot:
```
$ benchmark-driver -v --rbenv 'before;after' benchmark.yml --repeat-count=12 --output=all
before: ruby 2.8.0dev (2020-04-13T23:45:05Z master b9d3ceee8f) [x86_64-linux]
after: ruby 2.8.0dev (2020-04-14T00:48:52Z no-splat-fastpath 418d363722) [x86_64-linux]
Calculating -------------------------------------
before after
Optcarrot Lan_Master.nes 50.76119601545175 42.73858236484051 fps
50.76388649761503 51.04211379912850
50.80930672252514 51.39455790755538
50.90236000778749 51.75656936556145
51.01744746340430 51.86875277356489
51.06495279015112 51.88692482485558
51.07785337168974 51.93429603190578
51.20163525187862 51.95768145071314
51.34671771913112 52.45577266040274
51.35918340835583 52.53163888762858
51.46641337418146 52.62172484121034
51.50835463462257 52.85064021113239
```
Notes:
Merged-By: k0kubun <takashikkbn@gmail.com>
|
|
Split ruby.h
Notes:
Merged-By: shyouhei <shyouhei@ruby-lang.org>
|
|
JIT support of dd723771c11.
$ benchmark-driver -v --rbenv 'before;before --jit;after --jit' benchmark/mjit_exivar.yml --repeat-count=4
before: ruby 2.8.0dev (2020-03-30T12:32:26Z master e5db3da9d3) [x86_64-linux]
before --jit: ruby 2.8.0dev (2020-03-30T12:32:26Z master e5db3da9d3) +JIT [x86_64-linux]
after --jit: ruby 2.8.0dev (2020-03-31T05:57:24Z mjit-exivar 128625baec) +JIT [x86_64-linux]
Calculating -------------------------------------
before before --jit after --jit
mjit_exivar 57.944M 53.579M 54.471M i/s - 200.000M times in 3.451588s 3.732772s 3.671687s
Comparison:
mjit_exivar
before: 57944345.1 i/s
after --jit: 54470876.7 i/s - 1.06x slower
before --jit: 53579483.4 i/s - 1.08x slower
|
|
|
|
changing add_iseq_to_process's debug counter name as well for comparison
|
|
This patch contains several ideas:
(1) Disposable inline method cache (IMC) for race-free inline method cache
* Making call-cache (CC) as a RVALUE (GC target object) and allocate new
CC on cache miss.
* This technique allows race-free access from parallel processing
elements like RCU.
(2) Introduce per-Class method cache (pCMC)
* Instead of fixed-size global method cache (GMC), pCMC allows flexible
cache size.
* Caching CCs reduces CC allocation and allow sharing CC's fast-path
between same call-info (CI) call-sites.
(3) Invalidate an inline method cache by invalidating corresponding method
entries (MEs)
* Instead of using class serials, we set "invalidated" flag for method
entry itself to represent cache invalidation.
* Compare with using class serials, the impact of method modification
(add/overwrite/delete) is small.
* Updating class serials invalidate all method caches of the class and
sub-classes.
* Proposed approach only invalidate the method cache of only one ME.
See [Feature #16614] for more details.
Notes:
Merged: https://github.com/ruby/ruby/pull/2888
|
|
Now, rb_call_info contains how to call the method with tuple of
(mid, orig_argc, flags, kwarg). Most of cases, kwarg == NULL and
mid+argc+flags only requires 64bits. So this patch packed
rb_call_info to VALUE (1 word) on such cases. If we can not
represent it in VALUE, then use imemo_callinfo which contains
conventional callinfo (rb_callinfo, renamed from rb_call_info).
iseq->body->ci_kw_size is removed because all of callinfo is VALUE
size (packed ci or a pointer to imemo_callinfo).
To access ci information, we need to use these functions:
vm_ci_mid(ci), _flag(ci), _argc(ci), _kwarg(ci).
struct rb_call_info_kw_arg is renamed to rb_callinfo_kwarg.
rb_funcallv_with_cc() and rb_method_basic_definition_p_with_cc()
is temporary removed because cd->ci should be marked.
Notes:
Merged: https://github.com/ruby/ruby/pull/2888
|
|
Include what is necessary.
Notes:
Merged: https://github.com/ruby/ruby/pull/2711
|
|
These functions are enabled only on USE_DEBUG_COUNTER=1.
|
|
|
|
Fixed misspellings reported at [Bug #16437], missed and a new
typo.
|
|
|
|
|
|
Implemented fine-grained inspection of cache misshits. Handy for
counting the reasons why an inline method cache was evicted.
|
|
Introduce a new debug counter `obj_ary_extracapa` which counts
arrays which are `len < capa`.
|
|
for debug counters
```
../include/ruby/intern.h:1175:137: warning: passing argument 3 of 'rb_define_singleton_method0' from incompatible pointer type [-Wincompatible-pointer-types]
#define rb_define_singleton_method(klass, mid, func, arity) rb_define_singleton_method_choose_prototypem3((arity),(func))((klass),(mid),(func),(arity));
^
../vm.c:2958:5: note: in expansion of macro 'rb_define_singleton_method'
rb_define_singleton_method(rb_cRubyVM, "show_debug_counters", rb_debug_counter_show, 0);
^~~~~~~~~~~~~~~~~~~~~~~~~~
../include/ruby/intern.h:1139:99: note: expected 'VALUE (*)(VALUE) {aka long unsigned int (*)(long unsigned int)}' but argument is of type 'VALUE (*)(void) {aka long unsigned int (*)(void)}'
__attribute__((__unused__,__weakref__("rb_define_singleton_method"),__nonnull__(2,3)))static void rb_define_singleton_method0 (VALUE,const char*,VALUE(*)(VALUE),int);
```
|
|
I am trying to study debug counters inside a Rails application.
Accessing debug counters by killing the process is hard because child
processes don't get the same TRAP as the parent, and Rails seems to
intercept calls to `exit`. Adding this method lets me print the debug
counters when I want (at the end of requests for example)
|
|
add debug_counters to check the Hash object statistics.
|
|
On ar_table, Do not keep a full-length hash value (FLHV, 8 bytes)
but keep a 1 byte hint from a FLHV (lowest byte of FLHV).
An ar_table only contains at least 8 entries, so hints consumes
8 bytes at most. We can store hints in RHash::ar_hint.
On 32bit CPU, we use 4 entries ar_table.
The advantages:
* We don't need to keep FLHV so ar_table only consumes
16 bytes (VALUEs of key and value) * 8 entries = 128 bytes.
* We don't need to scan ar_table, but only need to check hints
in many cases. Especially we don't need to access ar_table
if there is no match entries (in many cases).
It will increase memory cache locality.
The disadvantages:
* This technique can increase `#eql?` time because hints can
conflicts (in theory, it conflicts once in 256 times).
It can introduce incompatibility if there is a object x where
x.eql? returns true even if hash values are different.
I believe we don't need to care such irregular case.
* We need to re-calculate FLHV if we need to switch from ar_table
to st_table (e.g. exceeds 8 entries).
It also can introduce incompatibility, on mutating key objects.
I believe we don't need to care such irregular case too.
Add new debug counters to measure the performance:
* artable_hint_hit - hint is matched and eql?#=>true
* artable_hint_miss - hint is not matched but eql?#=>false
* artable_hint_notfound - lookup counts
|
|
Change debug_counters for Hash object counts:
* obj_hash_under4 (1-3) -> obj_hash_1_4 (1-4)
* obj_hash_ge4 (4-7) -> obj_hash_5_8 (5-8)
* obj_hash_ge8 (>=8) -> obj_hash_g8 (> 8)
For example on rdoc benchmark:
[RUBY_DEBUG_COUNTER] obj_hash_empty 554,900
[RUBY_DEBUG_COUNTER] obj_hash_under4 572,998
[RUBY_DEBUG_COUNTER] obj_hash_ge4 1,825
[RUBY_DEBUG_COUNTER] obj_hash_ge8 2,344
[RUBY_DEBUG_COUNTER] obj_hash_empty 553,097
[RUBY_DEBUG_COUNTER] obj_hash_1_4 571,880
[RUBY_DEBUG_COUNTER] obj_hash_5_8 982
[RUBY_DEBUG_COUNTER] obj_hash_g8 2,189
|
|
Shared arrays created by Array#dup and so on points
a shared_root object to manage lifetime of Array buffer.
However, sometimes shared_root is called only shared so
it is confusing. So I fixed these wording "shared" to "shared_root".
* RArray::heap::aux::shared -> RArray::heap::aux::shared_root
* ARY_SHARED() -> ARY_SHARED_ROOT()
* ARY_SHARED_NUM() -> ARY_SHARED_ROOT_REFCNT()
Also, add some debug_counters to count shared array objects.
* ary_shared_create: shared ary by Array#dup and so on.
* ary_shared: finished in shard.
* ary_shared_root_occupied: shared_root but has only 1 refcnt.
The number (ary_shared - ary_shared_root_occupied) is meaningful.
|
|
pattern matches are less than or equal to 4 results
Closes: https://github.com/ruby/ruby/pull/2135
|
|
is_pointer_to_heap() is used for conservative marking. To analyze
this function's behavior, introduce some debug_counters.
|
|
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@67638 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
r67550 introduced the typo
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@67553 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@67546 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
is defined. It's 0 by default and so it dissappears on actual build.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@67544 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@67460 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@67382 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@67379 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@67378 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@67377 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@67375 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@67374 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
Add counters to count ccf (call cache fastpath) usage.
These counters will help which kind of method dispatch
is important to optimize.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@67336 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
* internal.h: rename the following names:
* li_table -> ar_table. "li" means linear (from linear search),
but we use the word "array" (from data layout).
* RHASH_ARRAY -> RHASH_AR_TABLE. AR_TABLE is more clear.
* rb_hash_array_* -> rb_hash_ar_table_*.
* RHASH_TABLE_P() -> RHASH_ST_TABLE_P(). more clear.
* RHASH_CLEAR() -> RHASH_ST_CLEAR().
* hash.c: rename "linear_" prefix functions to "ar_" prefix.
* hash.c (linear_init_table): rename to ar_alloc_table.
* debug_counter.h: rename obj_hash_array to obj_hash_ar.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@66390 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
* hash.c, internal.h: support theap for small Hash.
Introduce RHASH_ARRAY (li_table) besides st_table and small Hash
(<=8 entries) are managed by an array data structure.
This array data can be managed by theap.
If st_table is needed, then converting array data to st_table data.
For st_table using code, we prepare "stlike" APIs which accepts hash value
and are very similar to st_ APIs.
This work is based on the GSoC achievement
by tacinight <tacingiht@gmail.com> and refined by ko1.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@65454 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
* struct.c: members memory can use theap.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@65452 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
* variable.c: now instance variable space has theap supports.
obj_ivar_heap_alloc() tries to acquire memory from theap.
* debug_counter.h: add some counters for theap.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@65451 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
* transient_heap.c, transient_heap.h: implement TransientHeap (theap).
theap is designed for Ruby's object system. theap is like Eden heap
on generational GC terminology. theap allocation is very fast because
it only needs to bump up pointer and deallocation is also fast because
we don't do anything. However we need to evacuate (Copy GC terminology)
if theap memory is long-lived. Evacuation logic is needed for each type.
See [Bug #14858] for details.
* array.c: Now, theap for T_ARRAY is supported.
ary_heap_alloc() tries to allocate memory area from theap. If this trial
sccesses, this array has theap ptr and RARRAY_TRANSIENT_FLAG is turned on.
We don't need to free theap ptr.
* ruby.h: RARRAY_CONST_PTR() returns malloc'ed memory area. It menas that
if ary is allocated at theap, force evacuation to malloc'ed memory.
It makes programs slow, but very compatible with current code because
theap memory can be evacuated (theap memory will be recycled).
If you want to get transient heap ptr, use RARRAY_CONST_PTR_TRANSIENT()
instead of RARRAY_CONST_PTR(). If you can't understand when evacuation
will occur, use RARRAY_CONST_PTR().
(re-commit of r65444)
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@65449 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@65447 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
* variable.c: now instance variable space has theap supports.
obj_ivar_heap_alloc() tries to acquire memory from theap.
* debug_counter.h: add some counters for theap.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@65446 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
* transient_heap.c, transient_heap.h: implement TransientHeap (theap).
theap is designed for Ruby's object system. theap is like Eden heap
on generational GC terminology. theap allocation is very fast because
it only needs to bump up pointer and deallocation is also fast because
we don't do anything. However we need to evacuate (Copy GC terminology)
if theap memory is long-lived. Evacuation logic is needed for each type.
See [Bug #14858] for details.
* array.c: Now, theap for T_ARRAY is supported.
ary_heap_alloc() tries to allocate memory area from theap. If this trial
sccesses, this array has theap ptr and RARRAY_TRANSIENT_FLAG is turned on.
We don't need to free theap ptr.
* ruby.h: RARRAY_CONST_PTR() returns malloc'ed memory area. It menas that
if ary is allocated at theap, force evacuation to malloc'ed memory.
It makes programs slow, but very compatible with current code because
theap memory can be evacuated (theap memory will be recycled).
If you want to get transient heap ptr, use RARRAY_CONST_PTR_TRANSIENT()
instead of RARRAY_CONST_PTR(). If you can't understand when evacuation
will occur, use RARRAY_CONST_PTR().
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@65444 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|
|
* debug_counter.h: add `gc_major_oldmalloc`.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@65362 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
|