autoconf/libtool generating code to test features missed `void` for
C calls prototypes w/o arguments.
Note that specific changes related to libtool have to be upstreamed.
Co-authored-by: Peter Kokot <petk@php.net>
close GH-13732
Apple Silicon has stricter rules about rwx mmap regions. They need to be created
using the MAP_JIT flag. However, the MAP_JIT seems to be incompatible with
MAP_SHARED. ZTS requires MAP_SHARED so that some threads may execute code from a
page while another writes/appends to it. We did not find another solution, other
than completely disabling JIT for Apple Silicon + ZTS.
See discussion in https://github.com/php/php-src/pull/13351.
Co-authored-by: Peter Kokot <peterkokot@gmail.com>
Fixes GH-13400
Closes GH-13396
Compile the file separately and only include a header. There doesn't
seem to be a good reason to directly include the C file here, and
this ensures that there are no symbol clashes (see GH-7197).
SUMMARY
We implemented a prototype of PHP JIT/arm64. Briefly speaking,
1. build system
Changes to the build system are made so that PHP JIT can be successfully
built and run on ARM-based machine.
Major change lies in file zend_jit_arm64.dasc, where the handler for
each opcode is generated into machine code. Note that this file is just
copied from zend_jit_x86.dasc and the *unimplemented* parts are
substitued with 'brk' instruction for future work.
2. registers
AArch64 registers are defined in file zend_jit_arm64.h. From our
perspectives, the register usage is quite different from the x86
implementation due to the different ABI, number of registers and
addressing modes.
We had many confusions on this part, and will discuss it in details in
the final section.
3. opcodes
Several opcodes are partially supported, including INIT_FCALL, DO_UCALL,
DO_ICALL, RETURN, ADD, PRE_INC, JMP, QM_ASSIGN, etc. Hence, simple use
scenarios such as user function call, loops, addition with integer and
floating point numbers can be supported.
18 micro test cases are added under 'ext/opcache/tests/jit/arm64/'. Note
that majority of these test cases are design for functional JIT, and
cases 'hot_func_*.phpt' and 'loop_002.phpt' can trigger tracing JIT.
4. test
Our local test environment is an ARM-based server with Ubuntu 20.04 and
GCC-10. Note that both HYBRID and CALL VM modes are supported. We
suggest running the JIT test cases using the following command. Out of
all 130 test cases, 66 cases can be passed currently.
```
$ make test TESTS='-d opcache.jit=1203 ext/opcache/tests/jit/'
```
DETAILS
1. I-cache flush
Instruction cache must be flushed for the JIT-ed code on AArch64. See
macro JIT_CACHE_FLUSH in file 'zend_jit_internal.h'.
2. Disassembler
Add initialization and jump target parse operations for AArch64 backed.
See the updates in file 'zend_jit_disasm.c'.
3. redzone
Enable redzone for AArch64. See the update in zend_vm_opcodes.h.
Redzone is designated to prevent 'vm_stack_data' from being optimized
out by compilers. It's worth noting that this 16-byte redzone might be
reused as temporary use(treated as extra stack space) for HYBRID mode.
4. stack space reservation
The definitions of HYBRID_SPAD, SPAD and NR_SPAD are a bit tricky for
x86/64.
In AArch64, HYBRID_SPAD and SPAD are both defined as 16. These 16 bytes
are pre-allocated for tempoerary usage along the exuection of JIT-ed
code. Take line 4185 in file zend_jit_arm64.dasc as an example. NR_SPAD
is defined as 48, out of which 32 bytes to save FP/IP/LR registers.
Note that we choose to always reserve HYBRID_SPAD bytes in HYBRID mode,
no matter whether redzone is used or not, for the sake of safety.
5. stack alignment
In AArch64 the stack pointer should be 16-byte aligned. Since shadow
stack is used for JIT, it's easy to guarantee the stack alignment, via
simply moving SP with an offset like 16 or a multiple of 16. That's why
NR_SPAD is defined as 48 and we use 32 of them to save FP/IP/LR
registers which only occupies 24 bytes.
6. global registers
x27 and x28 are reserved as global registers. See the updates in file
zend_jit_vm_helpers.c
7. function prologue for CALL mode
Two callee-saved registers x27 and x28 should saved in function
zend_jit_prologue() in file zend_jit_arm64.dasc. Besides the LR, i.e.
x30, should also be saved since runtime C helper functions(such as
zend_jit_find_func_helper) might be invoked along the execution of
JIT-ed code.
8. regset
Minor changes are done to regset operations particularly for AArch64.
See the updates in file zend_jit_internal.h.
REGISTER USAGE
In this section, we will first talk about our understanding on register
usage and then demonstrate our design.
1. Register usage for HYBRID/CALL modes
Registers are used similarly between HYBRID mode and CALL mode.
One difference is how FP and IP are saved. In HYBRID mode, they are
assigned to global registers, while in CALL mode they are saved/restored
on the VM stack explicitly in prologue/epilogue.
The other difference is that LR register should also be saved/restored
in CALL mode since JIT-ed code are invoked as normal functions.
2. Register usage for functional/tracing JIT
The way registers are used differs a lot between functional JIT and
tracing JIT.
For functional JIT, runtime C code (e.g. helper functions) would be
invoked along the execution of JIT-ed code. As the operands for *most*
opcodes are accessed via the stack slot, i.e. FP + offset. Hence there
is no need to save/restore local(caller-saved) registers before/after
invoking runtime C code.
Exception lies in Phi node and registers might be allocated for these
nodes. Currently I don't fully understand the reason, why registers are
allocated for Phi functions, because I suppose for different versions of
SSA variables at the Phi function, their postions on the stack slot
should be identical(in other words, access via the stack slot is enough
and there is no need to allocate registers).
For tracing JIT, runtime information are recorded for traces(before the
JIT compilation), and the data types and control flows are concrete as
well. Hence it's would be faster to conduct operations and computations
via registers rather than stack slots(as functional JIT does) for these
collected hot paths. Besides, runtime C code can be invoked for tracing
JIT, however this only happends for deoptimization and all registers are
saved to stack in advance.
3. Candidates for register allocator
1) opcode candidates
Function zend_jit_opline_supports_reg() determines the candidate opcodes
which can use CPU registers.
2) register candidates
Registers in set "ZEND_REGSET_FP + ZEND_REGSET_GP - ZEND_REGSET_FIXED -
ZEND_REGSET_PRESERVED" are available for register allocator.
Note that registers from ZEND_REGSET_FIXED are reserved for special
purpose, such as the stack pointer, and they are excluded from register
allocation process.
Note that registers from ZEND_REGSET_PRESERVED are callee-saved based on
the ABI and it's safe to not use them either.
4. Temporary registers
Temporary registers are needed by some opcodes to save intermediate
computation results.
1) Functions zend_jit_get_def_scratch_regset() and
zend_jit_get_scratch_regset() return which registers might be clobbered
by some opcodes. Hence register allocator would spill these scratch
registers if necessary when encountering these opcodes.
2) Macro ZEND_REGSET_LOW_PRIORITY denotes a set of registers which would
be allocated with low priority, and these registers can be used as
temporary usage to avoid conflicts to its best.
5. Compared to the x86 implementation, in JIT/arm64
1) Called-saved FP registers are included into ZEND_REGSET_PRESERVED for
AArch64.
2) We follow the logic of function zend_jit_opline_supports_reg().
3) We reserve 4 GPRs and 2 FPRs out from register allocator and use them
as temporary registers in particular. Note that these 6 registers are
included in set ZEND_REGSET_FIXED.
Since they are reserved, may-clobbered registers can be removed for most
opcodes except for function calls. Besides, low-priority registers are
defined as empty since all candidate registers are of the same priority.
See the updates in function zend_jit_get_scratch_regset() and macro
ZEND_REGSET_LOW_PRIORITY.
6. Why we reserve registers for temporary usage?
1) Addressing mode in AArch64 needs more temporary registers.
The addressing mode is different from x86 and tempory registers might be
*always* needed for most opcodes. For instance, an immediate must be
first moved into one register before storing into memory in AArch64,
whereas in x86 this immediate can be stored directly.
2) There are more registers in AArch64.
Compared to the solution in JIT/x86(that is, temporary registers are
reserved on demand, i.e. different registers for different opcodes under
different conditions), our solution seems a coarse-granularity and
brute-force solution, and the execution performance might be downgraded
to some extent since the number of candidate registers used for
allocation becomes less.
We suppose the performance loss might be acceptable since there are more
registers in AArch64.
3) Based on my understanding, scratch registers defined in x86 are
excluded from candidates for register allocator with *low possibility*,
and it can still allocate these registers. Special handling should be
conducted, such as checking 'reg != ZREG_R0'.
Hence, as we see it, it's simpler to reserve some temporary registers
exclusively. See the updates in function zend_jit_math_long_long() for
instance. TMP1 can be used directly without checking.
Co-Developed-by: Nick Gasson <Nick.Gasson@arm.com>
It seems that f3efb9e3fb introduced a "typo" which may result
in the following confusing message:
checking for mmap() using MAP_ANON shared memory support... no=yes
Let's fix this.
Signed-off-by: Michael Heimpold <mhei@heimpold.de>
Closes GH-6758.
This only moves the files, adjusts the build system, exports APIs
and does minor fixups to make sure the code builds.
This does not yet try to make the optimizer usable independently
of opcache.
Closes GH-6642.
Make sure the $PHP_THREAD_SAFETY variable is always available
when configuring extensions. It was previously available for
phpized extensions, but for in-tree builds it was being set
too late.
Then, use $PHP_THREAD_SAFETY instead of $enable_zts to check for
ZTS in bundled extensions, which makes sure these checks also
work for phpize builds.
Use $PHP_THREAD_SAFETY instead of $enable_zts to check for ZTS.
This variable is also available for phpize builds, while enable_zts
is only present for in-tree builds.
This can be seen when the `./configure` step fails to detect `HAVE_SHM_*`,
e.g. due to missing a necessary dependency to compile the test scripts.
(Run `./configure`, run `yum install libtool-ltdl-devel` for missing dependencies,
then run `make`, and php can end up built with 0 shared memory opcache caches)
Give a clearer error message than `unknown`
Searching for `opcache "Fatal Error Unable to allocate shared memory segment of"
"unknown: No such file or directory"` reveals issues such as
https://github.com/termux/termux-packages/issues/2234
Closes GH-5615
- all rules from pass2 moved to pass1
- all JMP unrelated rules from pass3 moved to pass1
- pass3 keeps only JMP optimization rules
- pass2.c is removed
- pass1_5.c remaned to pass1.c ("_5" was related to PHP 5)
Removed unused checks:
- mbsinit check removed, HAVE_MBSINIT removed (not used in php-src)
- mempcpy check removed, HAVE_MEMPCPY removed (not used in php-src anymore since
560ed89bfb which uses PHP's own implementation)
- strpncpy check removed, added via a8c9e893b6 and
not used.
- setpgid check removed since HAVE_SETPGID is not used
Moved to a central configure.ac:
- fpclass
- mbrlen moved to configure.ac (since the HAVE_MBRLEN is used accross the php-src)
- sigprocmask
- getcwd
- getwd
- glob
- strfmon
- nice
Duplicated checks removed:
- gethostname
- getlogin
- getpwuid_r
- socketpair
- mprotect check simplified
Some headers were checked multiple times in the main configure.ac file
and in the bundled extensions or SAPIs themselves. Also many of these
checks are then used accross other extensions or SAPIs so a central
configure.ac makes most sense for these checks.
Normalization include:
- Use dnl for everything that can be ommitted when configure is built in
favor of the shell comment character # which is visible in the output.
- Line length normalized to 80 columns
- Dots for most of the one line sentences
- Macro definitions include similar pattern header comments now
In f904830012, support for GNU Hurd was added to the opcache and
the configure check to ensure the opcache knows the flock struct
layout prior to building was changed check for two cases: BSD layout
and Linux layout. All the existing hard-coded cases in
ZendAccelerator.h follow these two cases, except for 64-bit AIX.
This means that even though building on 64-bit AIX would work,
the configure script refuses to continue.
Add a new configure check for the 64-bit AIX case and a new
compiler definition HAVE_FLOCK_AIX64. Now that all the cases are
covered, simplify the ifdef logic around these three HAVE_FLOCK_*
macros:
- The macOS and the various BSD flavors fall under HAVE_FLOCK_BSD
- Linux, HP-UX, GNU Hurd, 32-bit AIX, and SVR4 environments
fall under HAVE_FLOCK_LINUX
- 64-bit AIX falls under HAVE_FLOCK_AIX64
The only difference between the existing HAVE_FLOCK_LINUX and
the hard-coded Linux/HP-UX/Hurd case is that the latter
initialized the 5th member to 0, but since the C standard already
says that un-initialized members will be initialized to 0,
it's effectively the same.
Autoconf 2.50 released in 2001 made several macros obsolete including
the AC_TRY_RUN, AC_TRY_COMPILE and AC_TRY_LINK:
http://git.savannah.gnu.org/cgit/autoconf.git/tree/ChangeLog.2
These macros should be replaced with the current AC_FOO_IFELSE instead:
- AC_TRY_RUN with AC_RUN_IFELSE and AC_LANG_SOURCE
- AC_TRY_LINK with AC_LINK_IFELSE and AC_LANG_PROGRAM
- AC_TRY_COMPILE with AC_COMPILE_IFELSE and AC_LANG_PROGRAM
PHP 5.4 to 7.1 require Autoconf 2.59+ version, PHP 7.2 and above require
2.64+ version, and the PHP 7.2 phpize script requires 2.59+ version which
are all greater than above mentioned 2.50 version therefore systems
should be well supported by now.
This patch was created with the help of autoupdate script:
autoupdate <file>
Reference docs:
- https://www.gnu.org/software/autoconf/manual/autoconf-2.69/html_node/Obsolete-Macros.html
- https://www.gnu.org/software/autoconf/manual/autoconf-2.59/autoconf.pdf
Some editors utilizing .editorconfig automatically trim whitespaces. For
convenience this patch removes whitespaces in certain build files:
- ext/*/config*.m4
- configure.ac
- acinclude.m4