| Age | Commit message (Collapse) | Author |
|
* LDUR
* Fix up immediate masking
* Consume operands directly
* Consistency and cleanup
* More consistency and entrypoints
* Cleaner syntax for masks
* Cleaner shifting for encodings
|
|
|
|
|
|
|
|
|
|
|
|
|
|
* Initial setup for aarch64
* ADDS and SUBS
* ADD and SUB for immediates
* Revert moved code
* Documentation
* Rename Arm64* to A64*
* Comments on shift types
* Share sig_imm_size and unsig_imm_size
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
* Split instructions if necessary
* Add a reusable transform_insns function
* Split out comments labels from transform_insns
* Refactor alloc_regs to use transform_insns
|
|
PR: https://github.com/Shopify/ruby/pull/289
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Notes:
Merged: https://github.com/ruby/ruby/pull/6278
|
|
* Rename mjit_exec to jit_exec
* Rename mjit_exec_slowpath to mjit_check_iseq
* Remove mjit_exec references from comments
Notes:
Merged-By: k0kubun <takashikkbn@gmail.com>
|
|
Notes:
Merged: https://github.com/ruby/ruby/pull/6158
|
|
Allow str-concat arg to be any string subtype, not just rb_cString
Notes:
Merged-By: maximecb <maximecb@ruby-lang.org>
|
|
* YJIT: Add known_* helpers for Type
This adds a few helpers to Type which all return Options representing
what is known, from a Ruby perspective, about the type.
This includes:
* known_class_of: If known, the class represented by this type
* known_value_type: If known, the T_ value type
* known_exact_value: If known, the exact VALUE represented by this type
(currently this is only available for true/false/nil)
* known_truthy: If known, whether or not this value evaluates as true
(not false or nil)
The goal of this is to abstract away the specifics of the mappings
between types wherever possible from the codegen. For example previously
by introducing Type::CString as a more specific version of
Type::TString, uses of Type::TString in codegen needed to be updated to
check either case. Now by using known_value_type, at least in theory we
can introduce new types with minimal (if any) codegen changes.
I think rust's Option type allows us to represent this uncertainty
fairly well, and should help avoid mistakes, and the matching using this
turned out pretty cleanly.
* YJIT: Use known_value_type for checktype
* YJIT: Use known_value_type for T_STRING check
* YJIT: Use known_class_of in guard_known_klass
* YJIT: Use known truthyness in jit_rb_obj_not
* YJIT: Rename known_class_of => known_class
Notes:
Merged-By: maximecb <maximecb@ruby-lang.org>
|
|
(#6191)
Teach getblockparamproxy to handle the no-block case without exiting
Co-authored-by: John Hawthorn <john@hawthorn.email>
Co-authored-by: John Hawthorn <john@hawthorn.email>
Notes:
Merged-By: maximecb <maximecb@ruby-lang.org>
|
|
Write barriers may be required when VM_ENV_FLAG_WB_REQUIRED is set,
however write barriers only affect heap objects being written. If we
know an immediate value is being written we can skip this check.
Notes:
Merged-By: maximecb <maximecb@ruby-lang.org>
|
|
Notes:
Merged: https://github.com/ruby/ruby/pull/5978
|
|
This commit implements Objects on Variable Width Allocation. This allows
Objects with more ivars to be embedded (i.e. contents directly follow the
object header) which improves performance through better cache locality.
Notes:
Merged: https://github.com/ruby/ruby/pull/6117
|
|
In a small script the speed of this feature isn't really noticeable but
on Rails it's very noticeable how slow this can be. This PR aims to
speed up two parts of the functionality.
1) The Rust exit recording code
Instead of adding all samples as we see them to the yjit_raw_samples and
yjit_line_samples, we can increment the counter on the ones we've seen
before. This will be faster on traces where we are hitting the same
stack often. In a crude measurement of booting just the active record
base test (`test/cases/base_test.rb`) we found that this improved the
speed by 1 second.
This also results in a smaller marshal dump file which sped up the test
boot time by 4 seconds with trace exits on.
2) The Ruby parsing code
Previously we were allocating new arrays using `shift` and
`each_with_index`. This change avoids allocating new arrays by using an
index. This change saves us the most amount of time, gaining 11 seconds.
Before this change the test boot time took 62 seconds, after it took 47
seconds. This is still too long but it's a step closer to faster
functionality. Next we're going to tackle allowing you to collect trace
exits for a specific instruction. There is also some potential slowness
in the GC code that I'd like to take a second look at.
Co-authored-by: Aaron Patterson <tenderlove@ruby-lang.org>
Co-authored-by: Aaron Patterson <tenderlove@ruby-lang.org>
Notes:
Merged-By: maximecb <maximecb@ruby-lang.org>
|
|
encodings don't match, as discussed with byroot
Notes:
Merged: https://github.com/ruby/ruby/pull/6095
|
|
Add a counter for gc object refs in the machine code
This is to gather data for the eventual implementation of
a constant pool.
Notes:
Merged-By: maximecb <maximecb@ruby-lang.org>
|
|
Refactor gen_opt_mod in YJIT
Notes:
Merged-By: maximecb <maximecb@ruby-lang.org>
|
|
Also slightly broaden the cases where << on two strings will generate
specialised code rather than a plain method call.
Notes:
Merged: https://github.com/ruby/ruby/pull/6022
|
|
|
|
Notes:
Merged: https://github.com/ruby/ruby/pull/5643
|
|
This commit makes YJIT allocate memory for generated code gradually as
needed. Previously, YJIT allocates all the memory it needs on boot in
one go, leading to higher than necessary resident set size (RSS) and
time spent on boot initializing the memory with a large memset().
Users should no longer need to search for a magic number to pass to
`--yjit-exec-mem` since physical memory consumption should now more
accurately reflect the requirement of the workload.
YJIT now reserves a range of addresses on boot. This region start out
with no access permission at all so buggy attempts to jump to the region
crashes like before this change. To get this hardening at finer
granularity than the page size, we fill each page with trapping
instructions when we first allocate physical memory for the page.
Most of the time applications don't need 256 MiB of executable code, so
allocating on-demand ends up doing less total work than before. Case in
point, a simple `ruby --yjit-call-threshold=1 -eitself` takes about
half as long after this change. In terms of memory consumption, here is
a table to give a rough summary of the impact:
| Peak RSS in MiB | -eitself example | railsbench once |
| :-------------: | ---------------: | --------------: |
| before | 265 | 377 |
| after | 11 | 143 |
| no YJIT | 10 | 101 |
A new module is introduced to handle allocation bookkeeping.
`CodePtr` is moved into the module since it has a close relationship
with the new `VirtualMemory` struct. This new interface has a slightly
smaller surface than before in that marking a region as writable is no
longer a public operation.
Notes:
Merged-By: maximecb <maximecb@ruby-lang.org>
|
|
This way YJIT has to match CRuby for each of them.
Remove unused string_p() Rust function
Notes:
Merged-By: maximecb <maximecb@ruby-lang.org>
|