ruby.git/yjit/src/asm/arm64/inst, branch v3_3_11

Typofix under bootstraptest, spec and yjit directories

2023-12-25T04:50:23+00:00

YJIT: implement fast path for integer multiplication in opt_mult (#8204)

2023-08-18T14:05:32+00:00

* YJIT: implement fast path for integer multiplication in opt_mult

* Update yjit/src/codegen.rs

Co-authored-by: Alan Wu 

* Implement mul with overflow checking on arm64

* Fix missing semicolon

* Add arm splitting for lshift, rshift, urshift

---------

Co-authored-by: Alan Wu

Implement MUL instruction for aarch64 (#8193)

2023-08-09T16:21:53+00:00

YJIT: fix 32 and 16 bit register store (#6840)

2022-12-01T15:53:50+00:00

* Fix 32 and 16 bit register store in YJIT

Co-Authored-By: Takashi Kokubun 

* Remove an unnecessary diff

* Reuse an rm_num_bits result

* Use u16::MAX instead

* Update the link

Co-authored-by: Alan Wu 

* Just use sturh for 16 bits

Co-authored-by: Takashi Kokubun 
Co-authored-by: Alan Wu

Implement LDURH on Aarch64

2022-11-15T01:04:50+00:00

When RUBY_DEBUG is enabled, shape ids are 16 bits.  I would like to do
16 bit comparisons, so I need to load halfwords sometimes.  This commit
adds LDURH so that I can load halfwords.

  https://developer.arm.com/documentation/ddi0596/2021-12/Base-Instructions/LDURH--Load-Register-Halfword--unscaled--?lang=en

I verified the bytes using clang:

```
$ cat asmthing.s
.global _start
.align 2

_start:
  ldurh w10, [x1]
  ldurh w10, [x1, #123]
$ as asmthing.s -o asmthing.o && objdump --disassemble asmthing.o

asmthing.o:	file format mach-o arm64

Disassembly of section __TEXT,__text:

0000000000000000 :
       0: 2a 00 40 78  	ldurh	w10, [x1]
       4: 2a b0 47 78  	ldurh	w10, [x1, #123]
```

YJIT: fix ARM64 bitmask encoding for 32 bit registers (#6503)

2022-10-06T22:41:38+00:00

For logical instructions such as AND, there is a constraint that the N
part of the bitmask immediate must be 0. We weren't respecting this
condition previously and were silently emitting undefined instructions.

Check for this condition in the assembler and tweak the backend to
correctly detect whether a number could be encoded as an immediate in a
32 bit logical instruction. Due to the nature of the immediate encoding,
the same numeric value encodes differently depending on the size of
the register the instruction works on.

We currently don't have cases where we use 32 bit immediates but we ran
into this encoding issue during development.

A bunch of clippy auto fixes for yjit (#6476)

2022-09-30T15:14:55+00:00

Change IncrCounter lowering on AArch64 (#6455)

2022-09-27T20:58:01+00:00

* Change IncrCounter lowering on AArch64

Previously we were using LDADDAL which is not available on
Graviton 1 chips. Instead, we're going to use an exclusive
load/store group through the LDAXR/STLXR instructions.

* Update yjit/src/backend/arm64/mod.rs

Co-authored-by: Maxime Chevalier-Boisvert

YJIT: Add Opnd#with_num_bits to use only 8 bits (#6359)

2022-09-14T14:27:52+00:00

* YJIT: Add Opnd#sub_opnd to use only 8 bits

* Add with_num_bits and let arm64_split use it

* Add another assertion to with_num_bits

* Use only with_num_bits

Better offsets (#6315)

2022-09-09T15:37:41+00:00

* Introduce InstructionOffset for AArch64

There are a lot of instructions on AArch64 where we take an offset
from PC in terms of the number of instructions. This is for loading
a value relative to the PC or for jumping.

We were usually accepting an A64Opnd or an i32. It can get
confusing and inconsistent though because sometimes you would
divide by 4 to get the number of instructions or multiply by 4 to
get the number of bytes.

This commit adds a struct that wraps an i32 in order to keep all of
that logic in one place. It makes it much easier to read and reason
about how these offsets are getting used.

* Use b instruction when the offset fits on AArch64