I checked a simple Laravel CRUD application's home page under Callgrind
and found that the line:
char resolved_path[MAXPATHLEN] = {0};
took up about 0.95% of the spent instruction count.
This is because when opcache revalidates the timestamps, it has to go
through the function virtual_file_ex() which contains that line. That
line will memset 4096 bytes on my system to all zeroes. This is bad for
the data cache and for the runtime.
I found that this memsetting is unnecessary in most cases, and that
we can fix the one remaining case:
* Lines 1020-1027 don't do anything with resolved_path, so that's okay.
* Lines 1033-1098:
- The !IS_ABSOLUTE_PATH branch will always result in a memcpy from
path to resolved_path (+ sometimes an offset) with the total copied
amount equal to path_length+1, so that includes a NUL byte.
- The else branch either takes the WIN32 path or the non-WIN32 path.
° WIN32: There's a copy from path+2 with length path_length-1.
Note that we chop off the first 2 bytes, so this also
includes the NUL byte.
° Non-WIN32: Copies path_length+1 bytes, so that includes a NUL byte.
At this point we know that resolved_path ends in a NUL byte. Going
further in the code:
* Lines 1100-1106 don't write to resolved_path, so no NUL byte is removed.
* Lines 1108-1136:
- The IS_UNC_PATH branch:
° Lines 1111-1112 don't overwrite the NUL byte, because we know the
path length is at least 2 due to the IS_UNC_PATH check.
° Both while loops uppercase the path until a slash is found. If a
NUL byte was found then it jumps to verify. Therefore, no NUL byte
can be overwritten. Furthermore, Lines 1121 and 1129 cannot
overwrite a NUL byte because the check at lines 1115 and 1123
would've jumped to verify when a NUL byte would be encountered.
Therefore, the IS_UNC_PATH branch cannot overwrite a NUL byte, so
the NUL byte we know we already got stays in place.
- The else branch:
° We know the path length is at least 2 due to IS_ABSOLUTE_PATH.
That means the earliest NUL byte can be at index 2, which can be
overwritten on line 1133. We fix this by adding one byte write if
the length is 2.
All uses of resolved_path in lines 1139-1141 have a NUL byte at the end
now.
Lines 1154-1164 do a bunch of post-processing but line 1164 will make
sure resolved_path still ends in a NUL byte.
So therefore I propose to remove the huge memset, and add a single byte
write in that one else branch I mentioned earlier.
Looking at Callgrind, the instruction count before this patch for 200
requests is 14,264,569,942; and after the patch it's 14,129,358,195
(averaged over a handful of runs).
For mb_parse_str, when mbstring.http_input (INI parameter) is a list of
multiple possible text encodings (which is not the case by default),
this new implementation is about 25% faster.
When mbstring.http_input is a single value, then nothing is changed.
(No automatic encoding detection is done in that case.)
Commit d835de1993 added support for AVX2 in hash table initialization
code. The same kind of code also occurs for HT_HASH_RESET. However, this
place was forgotten in that patch. That is unfortunate, because a loop
is just when there may be the most benefit from this SIMD sequence.
Furthermore, the NEON special handling exists in the initialization code
but is also missing from HT_HASH_RESET, so add this as well.
ElliotNB helped me a lot debugging this by constantly testing the
patches. It is only fair that he is mentioned too, as I couldn't have
solved it without his help.
In NEWS, each 'news item' is suffixed with the name of the developer
who implemented the change. When adding entries to UPGRADING, I used
the same format as NEWS, without thinking about it much. However, it
has come to my attention that the standard format for entries in
UPGRADING does not include the developer's name.
The TSRM keeps a hashtable mapping the thread IDs to the thread resource pointers.
It's possible that the thread disappears without us knowing, and then another thread
gets spawned some time later with the same ID as the disappeared thread.
Note that since it's a new thread the TSRM key pointer and cached pointer will be NULL.
The Apache request handler `php_handler()` will try to fetch some fields from the SAPI globals.
It uses a lazy thread resource allocation by calling `ts_resource(0);`.
This allocates a thread resource and sets up the TSRM pointers if they haven't been set up yet.
At least, that's what's supposed to happen. But since we are in a situation where the thread ID
still has the resources of the *old* thread associated in the hashtable,
the loop in `ts_resource_ex` will find that thread resource and assume the thread has been setup
already. But this is not the case since this thread is actually a new thread, just reusing the ID
of the old one, without any relation whatsoever to the old thread.
Because of this assumption, the TSRM pointers will not be setup, leading to a
NULL pointer dereference when trying to access the SAPI globals.
We can easily detect this scenario: if we're in the fallback path, and the pointer is NULL,
and we're looking for our own thread resource, we know we're actually reusing a thread ID.
In that case, we'll free up the old thread resources gracefully (gracefully because
there might still be resources open like database connection which need to be
shut down cleanly). After freeing the resources, we'll create the new resources for
this thread as if the stale resources never existed in the first place.
From that point forward, it is as if that situation never occurred.
The fact that this situation happens isn't that bad because a child process containing
threads will eventually be respawned anyway by the SAPI, so the stale thread resources
won't remain forever.
Note that we can't simply assign our own TSRM pointers to the existing
thread resource for our ID, since it was actually from a different thread
(just with the same ID!). Furthermore, the dynamically loaded extensions
have their own pointer, which is only set when their constructor is
called, so we'd have to call their constructor anyway...
I also tried to call the dtor and then the ctor again for those resources
on the pre-existing thread resource to reuse storage, but that didn't work properly
because other code doesn't expect something like that to happen, which breaks assumptions,
and this in turn caused Valgrind to (rightfully) complain about memory bugs.
Note 2: I also had to fix a bug in the core globals destruction because it
always assumed that the thread destroying them was the owning thread,
which on TSRM shutdown isn't always the case. A similar bug was fixed
recently with the JIT globals.
Closes GH-10863.
Not enough space was reserved for the packed resulting array because of
some confusion in the meaning of nr of used slots vs nr of elements.
Co-authored-by: Ilija Tovilo <ilija.tovilo@me.com>
After a hash filling routine the number of elements are set to the fill
index. However, if the fill index is larger than the number of elements,
the number of elements are no longer correct. This is observable at
least via count() and var_dump(). E.g. the attached test case would
incorrectly show int(17) instead of int(11).
Solve this by only increasing the number of elements by the actual
number that got added. Instead of adding a variable that increments per
iteration, I wanted to save some cycles in the iteration and simply
compute the number of added elements at the end.
I discovered this behaviour while fixing GH-11016, where this filling
routine is easily exposed to userland via a specialised VM path [1].
Since this seems to be more a general problem with the macros, and may
be triggered outside of the VM handlers, I fixed it in the macros
instead of modifying the VM to fixup the number of elements.
[1] https://github.com/php/php-src/blob/b2c5acbb010f4bbc7ea9b53ba9bc81d672dd0f34/Zend/zend_vm_def.h#L6132-L6141