It looks like the config.w32 uses CHECK_HEADER_ADD_INCLUDE to add the include
path to libxml into the search path.
That doesn't happen in zend-test.
To add to the Windows trouble, libxml is statically linked in, ext/libxml can
only be built statically but ext/zend-test can be built both statically and
dynamically.
So the regression tests won't work in all possible configurations anyway on Windows.
All of this is no problem on Linux because it just uses dynamic linking
and pkg-config, without any magic.
Signed-off-by: Ben Ramsey <ramsey@php.net>
(cherry picked from commit 62228a2568)
Fixes GHSA-3qrf-m4j2-pcrr.
To parse a document with libxml2, you first need to create a parsing context.
The parsing context contains parsing options (e.g. XML_NOENT to substitute
entities) that the application (in this case PHP) can set.
Unfortunately, libxml2 also supports providing default set options.
For example, if you call xmlSubstituteEntitiesDefault(1) then the XML_NOENT
option will be added to the parsing options every time you create a parsing
context **even if the application never requested XML_NOENT**.
Third party extensions can override these globals, in particular the
substitute entity global. This causes entity substitution to be
unexpectedly active.
Fix it by setting the parsing options to a sane known value.
For API calls that depend on global state we introduce
PHP_LIBXML_SANITIZE_GLOBALS() and PHP_LIBXML_RESTORE_GLOBALS().
For other APIs that work directly with a context we introduce
php_libxml_sanitize_parse_ctxt_options().
(cherry picked from commit c283c3ab0b)
* PHP-8.1:
Fix GH-11630: proc_nice_basic.phpt only works at certain nice levels
Fix GH-11629: bug77020.phpt tries to send mail
Fix GH-11625: DOMElement::replaceWith() doesn't replace node with DOMDocumentFragment but just deletes node or causes wrapping <></> depending on libxml2 version
Depending on the libxml2 version, the behaviour is either to not
render the fragment correctly, or to wrap it inside <></>. Fix it by
unpacking fragments manually. This has the side effect that we need to
move the unlinking check in the replacement function to earlier because
the empty child list is now possible in non-error cases.
Also fixes a mistake in the linked list management.
Closes GH-11627.
* PHP-8.1:
Revert "Fix GH-11404: DOMDocument::savexml and friends ommit xmlns="" declaration for null namespace, creating incorrect xml representation of the DOM"
This reverts commit 7eb3e9cd17.
Although the fix follows the spec, it causes issues because a lot of old
code assumes the incorrect behaviour PHP had since a long time.
We cannot do this yet, especially not in a stable release.
We revert this for the time being.
See GH-11428.
The problem is the usage of zval_get_long(). In particular, if the
string is non-numeric the result of zval_get_long() will be 0 without
giving an error or warning. This is misleading for users: users get the
impression that they can use strings to access the map because it
coincidentally works for the first item (which is at index 0). Of
course, this fails with any other index which causes confusion and bugs.
This patch adds proper support for using string offsets while accessing
the map. It does so by detecting if it's a non-numeric string, and then
using the getNamedItem() method instead of item(). I had to split up the
array access implementation code for DOMNodeList and DOMNamedNodeMap
first to be able to do this.
Closes GH-11468.
* PHP-8.1:
Fix GH-11404: DOMDocument::savexml and friends ommit xmlns="" declaration for null namespace, creating incorrect xml representation of the DOM
The NULL namespace is only correct when there is no default namespace
override. When there is, we need to manually set it to the empty string
namespace.
Closes GH-11428.
We'll use the DOM wrapper version of libxml2 instead of the regular one.
It's conforming to the behaviour we expect of DOM.
Most of this patch is tests.
I based and extended the tests on the code attached with the aforementioned
bug reports. Therefore the credits for the tests:
Co-authored-by: hilse at web dot de
Co-authored-by: robin2008 at altruists dot org
Co-authored-by: sgunderson at bigfoot dot com
We'll also change the searching point of the internal reconciliation to
start at the top of the added tree to avoid redundant work now that the
function is changed.
Closes GH-11454.
* PHP-8.1:
Fix GH-11433: Unable to set CURLOPT_ACCEPT_ENCODING to NULL
Fix "invalid state error" with cloned namespace declarations
Fix lifetime issue with getAttributeNodeNS()
* Fix type confusion and parent reference
* Manually manage the lifetime of the parent
* Add regression tests
* Break out to a helper, and apply the use-after-free fix to xpath
Closes GH-11402.
* PHP-8.1:
Fix bug #77686: Removed elements are still returned by getElementById
Fix bug #81642: DOMChildNode::replaceWith() bug when replacing a node with itself
Fix bug #67440: append_node of a DOMDocumentFragment does not reconcile namespaces
From the moment an ID is created, libxml2's behaviour is to cache that element,
even if that element is not yet attached to the document. Similarly, only upon
destruction of the element the ID is actually removed by libxml2.
Since libxml2 has such behaviour deeply ingrained in the library, and uses the
cache for various purposes, it seems like a bad idea and lost cause to fight it.
Instead, we'll simply walk the tree upwards to check if the node is attached to
the document.
Closes GH-11369.
The test was amended from the original issue report. For the test:
Co-authored-by: php@deep-freeze.ca
The problem is that the regular dom_reconcile_ns() only works on a
single node. We actually have to reconciliate the whole tree in case a
fragment was added. This also required to move some code around such
that this special case could be handled separately.
Closes GH-11362.
It's a type confusion bug. `zend_make_callable` may change the function name
of the fci to become an array, causing a crash in debug mode on
`zval_ptr_dtor_str(&fci.function_name);` in `dom_xpath_ext_function_php`.
On a production build it doesn't crash but only causes a leak, because
the array elements are not destroyed, only the array container itself
is. We can use the nogc variant because it cannot contain cycles, the
potential array can only contain 2 strings.
Closes GH-11350.
* PHP-8.1:
Fix DOMElement::append() and DOMElement::prepend() hierarchy checks
Fix spec compliance error for DOMDocument::getElementsByTagNameNS
Fix GH-11336: php still tries to unlock the shared memory ZendSem with opcache.file_cache_only=1 but it was never locked
Fix GH-11338: SplFileInfo empty getBasename with more than one slash
We could end up in an invalid hierarchy, resulting in infinite loops and
eventual crashes if we don't check for the DOM hierarchy validity.
Closes GH-11344.
Spec link: https://dom.spec.whatwg.org/#concept-getelementsbytagnamens
Spec says we should match any namespace when '*' is provided. This was
however not the case: elements that didn't have a namespace were not
returned. This patch fixes the error by modifying the namespace check.
Closes GH-11343.
Not explicitly documenting the possibility of returning DOMElement causes
the Intelephense linter (a popular PHP linter with ~9 million downloads:
https://marketplace.visualstudio.com/items?itemName=bmewburn.vscode-intelephense-client)
to think this code is bad:
$xp->query("whatever")->item(0)->getAttribute("foo");
DOMNode does not have getAttribute (while DOMElement does).
Documenting the DOMElement return type should fix Intelephense's linter.
Closes GH-11342.
We can't directly call xmlNodeSetContent, because it might encode the string
through xmlStringLenGetNodeList for types
XML_DOCUMENT_FRAG_NODE, XML_ELEMENT_NODE, XML_ATTRIBUTE_NODE.
In these cases we need to use a text node to avoid the encoding.
For the other cases, we *can* rely on xmlNodeSetContent because it is either
a no-op, or handles the content without encoding and clears the properties
field if needed.
The test was taken from the issue report, for the test:
Co-authored-by: ThomasWeinert <thomas@weinert.info>
Closes GH-10245.
This replaces the implementation of before and after with one following
the spec very strictly, instead of trying to figure out the state we're
in by looking at the pointers. Also relaxes the condition on text node
copying to prevent working on a stale node pointer.
Closes GH-11299.
It's possible to categorise the failures into 2 categories:
- Changed error message. In this case we either duplicate the test and
modify the error message. Or if the change in error message is
small, we use the EXPECTF matchers to make the test compatible with both
old and new versions of libxml2.
- Missing warnings. This is caused by a change in libxml2 where the
parser started using SAX APIs internally [1]. In this case the
error_type passed to php_libxml_internal_error_handler() changed from
PHP_LIBXML_ERROR to PHP_LIBXML_CTX_WARNING because it internally
started to use the SAX handlers instead of the generic handlers.
However, for the SAX handlers the current input stack is empty, so
nothing is actually printed. I fixed this by falling back to a
regular warning without a filename & line number reference, which
mimicks the old behaviour. Furthermore, this change now also shows
an additional warning in a test which was previously hidden.
[1] 9a82b94a94
Closes GH-11162.
Discovered this pre-existing problem while testing GH-10682.
Note: this problem existed *before* that PR.
* Not all paths throw a hierarchy request error
* xmlFreeNode must be used instead of xmlFree for the fragment to also
free its children.
* Free up nodes that couldn't be added when xmlAddChild fails.
I unified the error handling code that's exactly the same with a goto to
prevent at least some of such problems in the future.
Closes GH-10981.
Fix it by extending the array sizes by one character. As the input is
limited to the maximum path length, there will always be place to append
the slash. As the php_check_specific_open_basedir() simply uses the
strings to compare against each other, no new failures related to too
long paths are introduced.
We'll let the DOM and XML case handle a potentially too long path in the
library code.