This changes the signature of opcode handlers in the CALL VM so that the opline
is passed directly via arguments. This reduces the number of memory operations
on EX(opline), and makes the CALL VM considerably faster.
Additionally, this unifies the CALL and HYBRID VMs a bit, as EX(opline) is now
handled in the same way in both VMs.
This is a part of GH-17849.
Currently we have two VMs:
* HYBRID: Used when compiling with GCC. execute_data and opline are global
register variables
* CALL: Used when compiling with something else. execute_data is passed as
opcode handler arg, but opline is passed via execute_data->opline
(EX(opline)).
The Call VM looks like this:
while (1) {
ret = execute_data->opline->handler(execute_data);
if (UNEXPECTED(ret != 0)) {
if (ret > 0) { // returned by ZEND_VM_ENTER() / ZEND_VM_LEAVE()
execute_data = EG(current_execute_data);
} else { // returned by ZEND_VM_RETURN()
return;
}
}
}
// example op handler
int ZEND_INIT_FCALL_SPEC_CONST_HANDLER(zend_execute_data *execute_data) {
// load opline
const zend_op *opline = execute_data->opline;
// instruction execution
// dispatch
// ZEND_VM_NEXT_OPCODE():
execute_data->opline++;
return 0; // ZEND_VM_CONTINUE()
}
Opcode handlers return a positive value to signal that the loop must load a
new execute_data from EG(current_execute_data), typically when entering
or leaving a function.
Here I make the following changes:
* Pass opline as opcode handler argument
* Return next opline from opcode handlers
* ZEND_VM_ENTER / ZEND_VM_LEAVE return opline|(1<<0) to signal that
execute_data must be reloaded from EG(current_execute_data)
This gives us:
while (1) {
opline = opline->handler(execute_data, opline);
if (UNEXPECTED((uintptr_t) opline & ZEND_VM_ENTER_BIT) {
opline = opline & ~ZEND_VM_ENTER_BIT;
if (opline != 0) { // ZEND_VM_ENTER() / ZEND_VM_LEAVE()
execute_data = EG(current_execute_data);
} else { // ZEND_VM_RETURN()
return;
}
}
}
// example op handler
const zend_op * ZEND_INIT_FCALL_SPEC_CONST_HANDLER(zend_execute_data *execute_data, const zend_op *opline) {
// opline already loaded
// instruction execution
// dispatch
// ZEND_VM_NEXT_OPCODE():
return ++opline;
}
bench.php is 23% faster on Linux / x86_64, 18% faster on MacOS / M1.
Symfony Demo is 2.8% faster.
When using the HYBRID VM, JIT'ed code stores execute_data/opline in two fixed
callee-saved registers and rarely touches EX(opline), just like the VM.
Since the registers are callee-saved, the JIT'ed code doesn't have to
save them before calling other functions, and can assume they always
contain execute_data/opline. The code also avoids saving/restoring them in
prologue/epilogue, as execute_ex takes care of that (JIT'ed code is called
exclusively from there).
The CALL VM can now use a fixed register for execute_data/opline as well, but
we can't rely on execute_ex to save the registers for us as it may use these
registers itself. So we have to save/restore the two registers in JIT'ed code
prologue/epilogue.
Closes GH-17952
Opcache JIT
This is the implementation of Opcache's JIT (Just-In-Time compiler), This converts the PHP Virtual Machine's opcodes into Intermediate Representation and uses IR - Lightweight JIT Compilation Framework to produce optimized native code. The necessary part of the IR Framework is embedded into php-src.
Running tests of the JIT
Then, to test the JIT, e.g. with opcache.jit=tracing, an example command based on what is used to test in CI:
make test TESTS="-d opcache.jit_buffer_size=16M -d opcache.enable=1 -d opcache.enable_cli=1 -d opcache.protect_memory=1 -d opcache.jit=tracing --repeat 2 --show-diff -j$(nproc) ext/opcache Zend"
opcache.jit_buffer_size=16Menables the JIT in tests by providing 16 megabytes of memory to use with the JIT to test with.opcache.protect_memory=1will detect writing to memory that is meant to be read-only, which is sometimes the cause of opcache bugs.--repeat 2is optional, but used in CI since some JIT bugs only show up after processing a request multiple times (the first request compiles the trace and the second executes it)-j$(nproc)runs as many workers to run tests as there are CPUs.ext/opcache/andZendare the folders with the tests to run, in this case opcache and the Zend engine itself. If no folders are provided, all tests are run.
When investigating test failures such as segmentation faults,
configuring the build of php with --enable-address-sanitizer to enable
AddressSanitizer is often useful.
Some of the time, adding -m --show-mem to the TESTS configuration is also useful to test with valgrind to detect out of bounds memory accesses.
Using valgrind is slower at detecting invalid memory read/writes than AddressSanitizer when running large numbers of tests, but does not require rebuilding php.
Note that the JIT supports 3 different architectures: X86_64, i386, and arm64.
Miscellaneous
How to build 32-bit builds on x86_64 environments
Refer to ../../../.github/workflows/push.yml for examples of dependencies to install.
If you are running this natively (outside of Docker or a VM):
- Consider running in docker/a VM instead if you are unfamiliar with this.
- Avoid purging packages.
- Avoid
-y- if the package manager warns you that the dependencies conflict then don't try to force install them.
Prerequisites for 32-bit builds
This assumes you are using a Debian-based Linux distribution and have already set up prerequisites for regular development.
sudo dpkg --add-architecture i386
sudo apt-get update -y
# As well as anything else from .github/actions/apt-x32/action.yml that you're testing locally
sudo apt-get install \
gcc-multilib g++-multilib \
libxml2-dev:i386 \
libc6:i386
Compiling 32-bit builds
This assumes you are using a Debian-based Linux distribution and have already set up prerequisites for 32-bit development.
export LDFLAGS=-L/usr/lib/i386-linux-gnu
export CFLAGS='-m32'
export CXXFLAGS='-m32'
export PKG_CONFIG=/usr/bin/i686-linux-gnu-pkg-config
./configure --disable-all --enable-opcache --build=i686-pc-linux-gnu
make -j$(nproc)
Running tests of the JIT on 32-bit builds
See the section "Running tests of the JIT".
Testing the jit with arm64 on x86 computers
https://www.docker.com/blog/faster-multi-platform-builds-dockerfile-cross-compilation-guide/ may be useful for local development.
Note that this is slower than compiling and testing natively.
# After following steps in https://www.docker.com/blog/faster-multi-platform-builds-dockerfile-cross-compilation-guide/
cp .gitignore .dockerignore
echo .git >> .dockerignore
docker build --network=host -t php-src-arm64-example -f ext/opcache/jit/Dockerfile.arm64.example .
docker run -it --rm php-src-arm64-example
Then, the docker image can be used to run tests with make test.
For example, to test ext/opcache in parallel with the tracing JIT enabled:
docker run -it php-src-arms-example make test TESTS="-d opcache.jit_buffer_size=16M -d opcache.enable=1 -d opcache.enable_cli=1 -d opcache.protect_memory=1 -d opcache.jit=tracing --repeat 2 --show-diff -j$(nproc) ext/opcache"