This was a bug in both libxml and PHP.
We follow up with the same change as done in GNOME/libxml@b3871dd138.
Changing away from `xmlOutputBufferCreateFilenameDefault` is not
possible yet because this is a stable branch and would break BC.
Closes GH-17254.
In master I use ZEND_DIAGNOSTIC_IGNORED_START, but that doesn't exist on
8.2 or 8.3 (8.3 has a similar macro though).
So to unbreak CI I just made a variation of this directly in the
php_libxml.h header.
See 683e787860 (commitcomment-134301083)
Remove xmlErrMemory from the export section for Windows, this fixes the
build. Even though the original function was renamed [1] it is hidden,
so removing this should be sufficient and not be a BC break.
[1] 130436917c
Closes GH-14719.
There's two issues here:
- freeing of predefined entity declaration crashes (unique to 8.3 & master)
- using multiple entity references for a single entity declaration crashes
(since forever)
The fix for the last issue is fairly easy to do on 8.3, but may require a
slightly different approach on 8.2. Therefore, for now this is 8.3-only.
Closes GH-13004.
In master I use ZEND_DIAGNOSTIC_IGNORED_START, but that doesn't exist on
8.2 or 8.3 (8.3 has a similar macro though).
So to unbreak CI I just made a variation of this directly in the
php_libxml.h header.
See 683e787860 (commitcomment-134301083)
Closes GH-12887.
The original caching implementation had an oversight in combination with
the new lifetime management in DOM for 8.3.
The modification counter is stored on the document object itself, but as
that can get deallocated when all references disappear, stale cache data
can be used. Normally this isn't a problem, unless getElementsByTagName is
called not on the document but on a child node. Fix it by moving caching
data into the ref object, which will outlive all nodes from a document
even if the document object disappears.
Closes GH-12338.
libxml2 prior to 2.9.8 had a different signature for xmlHashScanner.
This signature changed in e03f0a199a
Use an #if to work around the incompatible signature.
Closes GH-12326.
Fixes GHSA-3qrf-m4j2-pcrr.
To parse a document with libxml2, you first need to create a parsing context.
The parsing context contains parsing options (e.g. XML_NOENT to substitute
entities) that the application (in this case PHP) can set.
Unfortunately, libxml2 also supports providing default set options.
For example, if you call xmlSubstituteEntitiesDefault(1) then the XML_NOENT
option will be added to the parsing options every time you create a parsing
context **even if the application never requested XML_NOENT**.
Third party extensions can override these globals, in particular the
substitute entity global. This causes entity substitution to be
unexpectedly active.
Fix it by setting the parsing options to a sane known value.
For API calls that depend on global state we introduce
PHP_LIBXML_SANITIZE_GLOBALS() and PHP_LIBXML_RESTORE_GLOBALS().
For other APIs that work directly with a context we introduce
php_libxml_sanitize_parse_ctxt_options().
Change the way lifetime works in ext/libxml and ext/dom
Previously, a node could be freed even when holding a userland reference
to it. This resulted in exceptions when trying to access that node after
it has been implicitly or explicitly removed. After this patch, a node
will only be freed when the last userland reference disappears.
Fixes GH-9628.
Closes GH-11576.
* Implement iteration cache, item cache and length cache for node list iteration
The current implementation follows the spec requirement that the list
must be "live". This means that changes in the document must be
reflected in the existing node lists without requiring the user to
refetch the node list.
The consequence is that getting any item, or the length of the list,
always starts searching from the root element of the node list. This
results in O(n) time to get any item or the length. If there's a for
loop over the node list, this means the iterations will take O(n²) time
in total. This causes real-world performance issues with potential for
downtime (see GH-11308 and its references for details).
We fix this by introducing a caching strategy. We cache the last
iterated object in the iterator, the last requested item in the node
list, and the last length computation. To invalidate the cache, we
simply count the number of modifications made to the containing
document. If the modification number does not match what the number was
during caching, we know the document has been modified and the cache is
invalid. If this ever overflows, we saturate the modification number and
don't do any caching anymore. Note that we don't check for overflow on
64-bit systems because it would take hundreds of years to overflow.
Fixes GH-11308.