This is done by adding a new zend_atomic_bool type. The type
definition is only available for compiler alignment and size info; it
should be treated as opaque and only the zend_atomic_bool_* family of
functions should be used.
Note that directly using atomic_bool is complicated. All C++ compilers
stdlibs that I checked typedef atomic_bool to std::atomic<bool>, which
can't be used in an extern "C" section, and there's at least one usage
of this in core, and probably more outside of it.
So, instead use platform specific functions, preferring compiler
intrinsics.
Indirect Branch Tracking (IBT) is part of Intel's Control-Flow
Enforcement Technology (CET). IBT is hardware based, forward edge
Control-Flow-Integrity mechanism where any indirect CALL/JMP must target
an ENDBR instruction or suffer #CP.
This commit adds IBT support for JIT:
1. Add endbr32/64 instruction in Dynasm.
2. Insert endbr32/64 in indirect branch target for jitted code.
gcc support CET since v8.1 and set it to default since gcc 11. With this
commit, endbr is inserted in jitted code if PHP is compiled with "gcc
-fcf-protection=full/branch".
Signed-off-by: Chen, Hu <hu1.chen@intel.com>
After Nikita Popov found a buffer overrun bug in one of my pull
requests, I was prompted to add more assertions in a38c7e5703 to help
me catch such bugs myself more easily in testing.
Wouldn't you just know it... as soon as I added those assertions, the
mbstring test suite caught another buffer overrun bug in my UTF-7
conversion code, which I wrote the better part of a year ago.
Then, when I started fuzzing the code with libfuzzer, I found
and fixed another buffer overflow:
If we enter the main loop, which normally outputs 3 decoded Base64
characters, where the first half of a surrogate pair had appeared at
the end of the previous run, but the second half does not appear
on this run, we need to output one error marker.
Then, at the end of the main loop, if the Base64 input ends at an
unexpected position AND the last character was not a legal
Base64-encoded character, we need to output two error markers
for that. The three error markers plus two valid, decoded bytes
can push us over the available space in our wchar buffer.
When testing the preceding commits, I used a script to generate a large
number of random strings and try to find strings which would yield
different outputs from the new and old encoding conversion code.
Some were found. In most cases, analysis revealed that the new code
was correct and the old code was not.
In all cases where the new code was incorrect, regression tests were
added. However, there may be some value in adding regression tests
for cases where the old code was incorrect as well. That is done here.
This does not cover every case where the new and old code yielded
different results. Some of them were very obscure, and it is proving
difficult even to reproduce them (since I did not keep a record of
all the input strings which triggered the differing output).