This is done by adding a new zend_atomic_bool type. The type
definition is only available for compiler alignment and size info; it
should be treated as opaque and only the zend_atomic_bool_* family of
functions should be used.
Note that directly using atomic_bool is complicated. All C++ compilers
stdlibs that I checked typedef atomic_bool to std::atomic<bool>, which
can't be used in an extern "C" section, and there's at least one usage
of this in core, and probably more outside of it.
So, instead use platform specific functions, preferring compiler
intrinsics.
Indirect Branch Tracking (IBT) is part of Intel's Control-Flow
Enforcement Technology (CET). IBT is hardware based, forward edge
Control-Flow-Integrity mechanism where any indirect CALL/JMP must target
an ENDBR instruction or suffer #CP.
This commit adds IBT support for JIT:
1. Add endbr32/64 instruction in Dynasm.
2. Insert endbr32/64 in indirect branch target for jitted code.
gcc support CET since v8.1 and set it to default since gcc 11. With this
commit, endbr is inserted in jitted code if PHP is compiled with "gcc
-fcf-protection=full/branch".
Signed-off-by: Chen, Hu <hu1.chen@intel.com>
After Nikita Popov found a buffer overrun bug in one of my pull
requests, I was prompted to add more assertions in a38c7e5703 to help
me catch such bugs myself more easily in testing.
Wouldn't you just know it... as soon as I added those assertions, the
mbstring test suite caught another buffer overrun bug in my UTF-7
conversion code, which I wrote the better part of a year ago.
Then, when I started fuzzing the code with libfuzzer, I found
and fixed another buffer overflow:
If we enter the main loop, which normally outputs 3 decoded Base64
characters, where the first half of a surrogate pair had appeared at
the end of the previous run, but the second half does not appear
on this run, we need to output one error marker.
Then, at the end of the main loop, if the Base64 input ends at an
unexpected position AND the last character was not a legal
Base64-encoded character, we need to output two error markers
for that. The three error markers plus two valid, decoded bytes
can push us over the available space in our wchar buffer.
When testing the preceding commits, I used a script to generate a large
number of random strings and try to find strings which would yield
different outputs from the new and old encoding conversion code.
Some were found. In most cases, analysis revealed that the new code
was correct and the old code was not.
In all cases where the new code was incorrect, regression tests were
added. However, there may be some value in adding regression tests
for cases where the old code was incorrect as well. That is done here.
This does not cover every case where the new and old code yielded
different results. Some of them were very obscure, and it is proving
difficult even to reproduce them (since I did not keep a record of
all the input strings which triggered the differing output).
One bug in the previous implementation; when it saw a sequence of
codepoints which looked like they might need to be emitted as a special
KDDI emoji, it would totally forget whether it was in ASCII mode,
JISX 0208 mode, or something else. So it could not reliably emit the
correct escape sequence to switch to the right mode.
Further, if the input ends with a codepoint which looks like it could
be part of a special KDDI emoji, then the legacy code did not emit
an escape sequence to switch back to ASCII mode at the end of the
string. This means that the emitted ISO-2022-JP-KDDI strings could not
always be safely concatenated.
There were bugs in the legacy implementation. Lots of them.
It did not properly track whether it has switched to JISX 0213 plane 1
or plane 2. If it processes a character in plane 1 and then immediately
one in plane 2, it failed to emit the escape code to switch to plane 2.
Further, when converting codepoints from 0x80-0xFF to ISO-2022-JP-2004,
the legacy implementation would totally disregard which mode it was
operating in. Such codepoints would pass through directly to the output
without any escape sequences being emitted.
If that was not enough, all the legacy implementations of JISX 0213:2004
encodings had another common bug; their 'flush function' did not call
the next flush function in the chain of conversion filters. So if any
of these encodings were converted to an encoding where the flush
function was needed to finish the output string, then the output
would be truncated.
All the legacy implementations of JISX 0213:2004 encodings had a
common bug; their 'flush function' did not call the next flush function
in the chain of conversion filters. So if any of these encodings were
converted to an encoding where the flush function was needed to finish
the output string, then the output would be truncated.