mirror of
https://github.com/php/doc-en.git
synced 2026-03-23 23:32:18 +01:00
Consolidates the duplicated note between `DOMDocument::loadHTML` and `DOMDocument::loadHTMLFile` and adds information about how libxml HTML5 parsing isn't stable across versions and the new `Dom\HTMLDocument` functions should be used. (Not documented yet, so they aren't linked, but will wire up automatically when it is.)
This commit is contained in:
@@ -1572,6 +1572,30 @@ it is inserted with (e.g.) <function xmlns="http://docbook.org/ns/docbook">DOMNo
|
||||
<!ENTITY dom.malformederror '<para xmlns="http://docbook.org/ns/docbook">While malformed HTML should load successfully, this function may generate <constant>E_WARNING</constant> errors when it encounters bad markup. <link linkend="function.libxml-use-internal-errors">libxml's error handling functions</link> may be used to handle these errors.</para>'>
|
||||
<!ENTITY dom.note.utf8 '<note xmlns="http://docbook.org/ns/docbook"><para>The DOM extension uses UTF-8 encoding. Use <function>mb_convert_encoding</function>, <methodname>UConverter::transcode</methodname>, or <function>iconv</function> to handle other encodings.</para></note>'>
|
||||
<!ENTITY dom.note.json '<note xmlns="http://docbook.org/ns/docbook"><para>When using <function>json_encode</function> on a <classname>DOMDocument</classname> object the result will be that of encoding an empty object.</para></note>'>
|
||||
<!ENTITY dom.domdocument.html5 '<warning xmlns="http://docbook.org/ns/docbook">
|
||||
<para>
|
||||
This function parses the input using an HTML 4 parser. The parsing rules
|
||||
of HTML 5, which is what modern web browsers use, are different. Depending
|
||||
on the input this might result in a different DOM structure. Therefore
|
||||
this function cannot be safely used for sanitizing HTML.
|
||||
</para>
|
||||
<para>
|
||||
The behavior when parsing HTML can depend on the version of
|
||||
<literal>libxml</literal> that is being used, particularly with regards to
|
||||
edge conditions and error handling.
|
||||
For parsing that conforms to the HTML5 specification,
|
||||
use <methodname>Dom\HTMLDocument::createFromString</methodname> or
|
||||
<methodname>Dom\HTMLDocument::createFromFile</methodname>, added in PHP 8.4.
|
||||
</para>
|
||||
<para>
|
||||
As an example, some HTML elements will implicitly close a parent element
|
||||
when encountered. The rules for automatically closing parent elements
|
||||
differ between HTML 4 and HTML 5 and thus the resulting DOM structure that
|
||||
<classname>DOMDocument</classname> sees might be different from the DOM
|
||||
structure a web browser sees, possibly allowing an attacker to break the
|
||||
resulting HTML.
|
||||
</para>
|
||||
</warning>'>
|
||||
|
||||
|
||||
|
||||
|
||||
@@ -18,22 +18,7 @@
|
||||
The function parses the HTML contained in the string <parameter>source</parameter>.
|
||||
Unlike loading XML, HTML does not have to be well-formed to load.
|
||||
</para>
|
||||
<warning>
|
||||
<para>
|
||||
This function parses the input using an HTML 4 parser. The parsing rules
|
||||
of HTML 5, which is what modern web browsers use, are different. Depending
|
||||
on the input this might result in a different DOM structure. Therefore
|
||||
this function cannot be safely used for sanitizing HTML.
|
||||
</para>
|
||||
<para>
|
||||
As an example, some HTML elements will implicitly close a parent element
|
||||
when encountered. The rules for automatically closing parent elements
|
||||
differ between HTML 4 and HTML 5 and thus the resulting DOM structure that
|
||||
<classname>DOMDocument</classname> sees might be different from the DOM
|
||||
structure a web browser sees, possibly allowing an attacker to break the
|
||||
resulting HTML.
|
||||
</para>
|
||||
</warning>
|
||||
&dom.domdocument.html5;
|
||||
</refsect1>
|
||||
<refsect1 role="parameters">
|
||||
&reftitle.parameters;
|
||||
|
||||
@@ -19,22 +19,7 @@
|
||||
<parameter>filename</parameter>. Unlike loading XML, HTML does not have
|
||||
to be well-formed to load.
|
||||
</para>
|
||||
<warning>
|
||||
<para>
|
||||
This function parses the input using an HTML 4 parser. The parsing rules
|
||||
of HTML 5, which is what modern web browsers use, are different. Depending
|
||||
on the input this might result in a different DOM structure. Therefore
|
||||
this function cannot be safely used for sanitizing HTML.
|
||||
</para>
|
||||
<para>
|
||||
As an example, some HTML elements will implicitly close a parent element
|
||||
when encountered. The rules for automatically closing parent elements
|
||||
differ between HTML 4 and HTML 5 and thus the resulting DOM structure that
|
||||
<classname>DOMDocument</classname> sees might be different from the DOM
|
||||
structure a web browser sees, possibly allowing an attacker to break the
|
||||
resulting HTML.
|
||||
</para>
|
||||
</warning>
|
||||
&dom.domdocument.html5;
|
||||
</refsect1>
|
||||
<refsect1 role="parameters">
|
||||
&reftitle.parameters;
|
||||
|
||||
Reference in New Issue
Block a user