The existing implementation of mb_strcut extracts part of a
multi-byte encoded string by pulling out raw bytes and then running
them through a conversion filter to ensure that the output is valid
in the requested encoding.
If the conversion filter emits error markers when doing the final
'flush' operation which ends the conversion of the extracted bytes,
these error markers may (in some cases) be included in the output.
The conversion operation does not respect the value of
mb_substitute_character; rather, it always uses '?' as an error marker.
So this issue manifests itself as unwanted '?' characters being
inserted into the output.
This issue has existed for a long time, but became noticeable in PHP
8.1 because for at least some of the supported text encodings, mbstring
is now more strict about emitting error markers when strings end in an
illegal state.
The simplest fix is to suppress error markers during the final flush
operation.
While working on a fix for this problem, another problem with mb_strcut
was discovered; since it decides when to stop consuming bytes from
the input by looking at the byte length of its OUTPUT, anything which
causes extra bytes to be emitted to the output may cause mb_strcut to
not consume all the bytes in the requested range.
The one case where we DO emit extra output bytes is for encodings
which have a selectable mode, like ISO-2022-JP; if a string in such
an encoding ends in a mode which is not the default, we emit an ending
escape sequence which changes back to the default mode. This is done
so that concatenating strings in such encodings is safe.
However, as mentioned, this can cause the output of mb_strcut to be
shorter than it logically should be. This bug has existed for a long
time, and fixing it now will be a BC break, so we may not fix it right
away.
Therefore, tests for THIS fix which don't pass because of that OTHER
bug have been split out into a separate test file (gh9535b.phpt), and
that file has been marked XFAIL.
Currently, the role attribute is mainly used to differentiate OO and procedural "aliases" when they are documented on the same page (e.g. https://www.php.net/manual/en/datetime.diff.php). However, these function-method counterparts are not always aliases in fact according to the stubs (#9491). That's why sometimes the usage of the role attribute is ambiguous, thus syncing the manual with the stubs results in false positive diffs.
This change fixes the problem by a very obtrusive way: by changing the value of all role="oop" attributes to the actual class name, like role="DateTime", and by getting rid of all role="procedural" attributes as they became unnecessary. This way, class synopsis pages can clearly reference methods, and skip functions. Additionally, gen_stub.php will be able to generate the correct methodsynopsis role even though a single page describes multiple methods, like `DateTime/DateTimeImmutable/DateTimeInterface::diff()`.
This occurs when the handle is different from the current handle (e.g. copy of the .so file), hence the existing test did not catch that particular case.
Directly referring to a constant of an undefined throws an exception;
there is not much point in `constant()` raising a fatal error in this
case.
Closes GH-9907.
For early observing, there already exists a op_array_ctor hook on zend_extension.
However the goal of the declared_function observer is noting the time when a fully defined function starts existing in the function_tables.
This also prevents the observer being called in case there were compilation errors.
Ultimately, this now gives a consistent behaviour with respect to how it works when opcache is enabled:
- pass_two is done, opcodes and flags are all finalized.
- similarly class_linked notifications also only happen once the class is actually finalized.
- any extension wanting to delay the observer call may add the ZEND_COMPILE_IGNORE_OBSERVER compiler_option, then call it itself.