mirror of
https://github.com/php/doc-zh.git
synced 2026-03-26 08:02:16 +01:00
263 lines
8.2 KiB
XML
263 lines
8.2 KiB
XML
<?xml version="1.0" encoding="utf-8"?>
|
||
<!-- $Revision$ -->
|
||
<!-- EN-Revision: 7c4b5fb40ac3149a5b931f1e31b1050ab5eaab7e Maintainer: daijie Status: ready -->
|
||
<!-- CREDITS: mowangjuanzi -->
|
||
<refentry xml:id="function.mb-detect-encoding" xmlns="http://docbook.org/ns/docbook" xmlns:xlink="http://www.w3.org/1999/xlink">
|
||
<refnamediv>
|
||
<refname>mb_detect_encoding</refname>
|
||
<refpurpose>检测字符的编码</refpurpose>
|
||
</refnamediv>
|
||
|
||
<refsect1 role="description">
|
||
&reftitle.description;
|
||
<methodsynopsis>
|
||
<type class="union"><type>string</type><type>false</type></type><methodname>mb_detect_encoding</methodname>
|
||
<methodparam><type>string</type><parameter>string</parameter></methodparam>
|
||
<methodparam choice="opt"><type class="union"><type>array</type><type>string</type><type>null</type></type><parameter>encodings</parameter><initializer>&null;</initializer></methodparam>
|
||
<methodparam choice="opt"><type>bool</type><parameter>strict</parameter><initializer>&false;</initializer></methodparam>
|
||
</methodsynopsis>
|
||
<para>
|
||
从候选列表中检测 <type>string</type> <parameter>string</parameter> 最可能的字符编码。
|
||
</para>
|
||
<para>
|
||
自 PHP 8.1 起,此函数使用启发式检测指定列表中最有可能正确的有效文本编码,且可能不按照提供的 <parameter>encodings</parameter> 顺序进行检测。
|
||
</para>
|
||
<para>
|
||
对预期(intended)字符编码的自动检测不可能永远完全可靠;没有额外的信息,就类似于在没有密钥的情况下解码已编码的字符串。最好使用与数据一起存储或传输的字符编码表示,例如“Content-Type” HTTP 头。
|
||
</para>
|
||
<para>
|
||
此函数适用于多字节编码,但并非所有字节顺序都构成有效字符串。如果输入字符串包含这样的顺序,则将会拒绝该编码。
|
||
</para>
|
||
|
||
<warning>
|
||
<title>结果不准确</title>
|
||
<para>
|
||
此函数的名称具有误导性,它执行的是“猜测”而不是“检测”。
|
||
</para>
|
||
<para>
|
||
猜测结果远非准确,因此无法使用此函数准确检测正确的字符编码。
|
||
</para>
|
||
</warning>
|
||
</refsect1>
|
||
|
||
<refsect1 role="parameters">
|
||
&reftitle.parameters;
|
||
<para>
|
||
<variablelist>
|
||
<varlistentry>
|
||
<term><parameter>string</parameter></term>
|
||
<listitem>
|
||
<para>
|
||
要检测的 <type>string</type>。
|
||
</para>
|
||
</listitem>
|
||
</varlistentry>
|
||
<varlistentry>
|
||
<term><parameter>encodings</parameter></term>
|
||
<listitem>
|
||
<para>
|
||
尝试的字符编码列表。该列表可以指定为字符串数组,或以逗号分隔的单个字符串。
|
||
</para>
|
||
<para>
|
||
如果省略 <parameter>encodings</parameter> 被或为 &null;,则将使用当前的 detect_order(使用
|
||
<link linkend="ini.mbstring.detect-order">mbstring.detect_order</link> 配置选项或
|
||
<function>mb_detect_order</function> 函数设置)。
|
||
</para>
|
||
</listitem>
|
||
</varlistentry>
|
||
<varlistentry>
|
||
<term><parameter>strict</parameter></term>
|
||
<listitem>
|
||
<para>
|
||
控制 <parameter>string</parameter> 在列出的所有 <parameter>encodings</parameter> 中无效时的行为。如果
|
||
<parameter>strict</parameter> 设置为 &false;,将返回最接近的匹配编码;如果 <parameter>strict</parameter>
|
||
设置为 &true;,将返回 &false;。
|
||
</para>
|
||
<para>
|
||
可以使用 <link linkend="ini.mbstring.strict-detection">mbstring.strict_detection</link> 配置选项设置 <parameter>strict</parameter> 的默认值。
|
||
</para>
|
||
</listitem>
|
||
</varlistentry>
|
||
</variablelist>
|
||
</para>
|
||
</refsect1>
|
||
|
||
<refsect1 role="returnvalues">
|
||
&reftitle.returnvalues;
|
||
<para>
|
||
检测到的字符编码,如果字符串在任何列出的编码中均无效,则返回 &false;。
|
||
</para>
|
||
</refsect1>
|
||
|
||
<refsect1 role="changelog">
|
||
&reftitle.changelog;
|
||
<informaltable>
|
||
<tgroup cols="2">
|
||
<thead>
|
||
<row>
|
||
<entry>&Version;</entry>
|
||
<entry>&Description;</entry>
|
||
</row>
|
||
</thead>
|
||
<tbody>
|
||
<row>
|
||
<entry>8.2.0</entry>
|
||
<entry>
|
||
<function>mb_detect_encoding</function>
|
||
将不再返回以下非文本编码:<literal>"Base64"</literal>、<literal>"QPrint"</literal>、<literal>"UUencode"</literal>、<literal>"HTML entities"</literal>、<literal>"7 bit"</literal> 和 <literal>"8 bit"</literal>。
|
||
</entry>
|
||
</row>
|
||
</tbody>
|
||
</tgroup>
|
||
</informaltable>
|
||
</refsect1>
|
||
|
||
<refsect1 role="examples">
|
||
&reftitle.examples;
|
||
<para>
|
||
<example>
|
||
<title><function>mb_detect_encoding</function> 示例</title>
|
||
<programlisting role="php">
|
||
<![CDATA[
|
||
<?php
|
||
|
||
$str = "\x95\xB6\x8E\x9A\x83\x52\x81\x5B\x83\x68";
|
||
|
||
// 使用当前的 detect_order 来检测字符编码
|
||
var_dump(mb_detect_encoding($str));
|
||
|
||
// “auto”将根据 mbstring.language 来扩展
|
||
var_dump(mb_detect_encoding($str, "auto"));
|
||
|
||
// 通过以逗号分隔的列表指定“encodings”参数
|
||
var_dump(mb_detect_encoding($str, "JIS, eucjp-win, sjis-win"));
|
||
|
||
// 使用数组指定“encodings”参数
|
||
$encodings = [
|
||
"ASCII",
|
||
"JIS",
|
||
"EUC-JP"
|
||
];
|
||
var_dump(mb_detect_encoding($str, $encodings));
|
||
?>
|
||
]]>
|
||
</programlisting>
|
||
&example.outputs;
|
||
<screen>
|
||
<![CDATA[
|
||
string(5) "ASCII"
|
||
string(5) "ASCII"
|
||
string(8) "SJIS-win"
|
||
string(5) "ASCII"
|
||
]]>
|
||
</screen>
|
||
</example>
|
||
</para>
|
||
<para>
|
||
<example>
|
||
<title><parameter>strict</parameter> 参数的影响</title>
|
||
<programlisting role="php">
|
||
<![CDATA[
|
||
<?php
|
||
// 'áéóú' 在 ISO-8859-1 中的编码
|
||
$str = "\xE1\xE9\xF3\xFA";
|
||
|
||
// 该字符串不是有效的 ASCII 或 UTF-8,但 UTF-8 被认为是更接近的匹配
|
||
var_dump(mb_detect_encoding($str, ['ASCII', 'UTF-8'], false));
|
||
var_dump(mb_detect_encoding($str, ['ASCII', 'UTF-8'], true));
|
||
|
||
// 如果找到有效编码,则严格参数不会更改结果
|
||
var_dump(mb_detect_encoding($str, ['ASCII', 'UTF-8', 'ISO-8859-1'], false));
|
||
var_dump(mb_detect_encoding($str, ['ASCII', 'UTF-8', 'ISO-8859-1'], true));
|
||
?>
|
||
]]>
|
||
</programlisting>
|
||
&example.outputs;
|
||
<screen>
|
||
<![CDATA[
|
||
string(5) "UTF-8"
|
||
bool(false)
|
||
string(10) "ISO-8859-1"
|
||
string(10) "ISO-8859-1"
|
||
]]>
|
||
</screen>
|
||
</example>
|
||
</para>
|
||
<para>
|
||
在某些情况下,相同的字节顺序可能会在多种字符编码中形成有效的字符串,并且无法知道其意图是哪种解释。例如,在众多字符编码中,字节序列“\xC4\xA2”可能是:
|
||
</para>
|
||
<para>
|
||
<simplelist>
|
||
<member>
|
||
"Ä¢" (U+00C4 LATIN CAPITAL LETTER A WITH DIAERESIS followed by U+00A2 CENT SIGN)
|
||
encoded in any of ISO-8859-1, ISO-8859-15, or Windows-1252
|
||
</member>
|
||
<member>
|
||
"ФЂ" (U+0424 CYRILLIC CAPITAL LETTER EF followed by U+0402 CYRILLIC CAPITAL LETTER
|
||
DJE) encoded in ISO-8859-5
|
||
</member>
|
||
<member>
|
||
"Ģ" (U+0122 LATIN CAPITAL LETTER G WITH CEDILLA) encoded in UTF-8
|
||
</member>
|
||
</simplelist>
|
||
</para>
|
||
<para>
|
||
<example>
|
||
<title>匹配多个编码时顺序的影响</title>
|
||
<programlisting role="php">
|
||
<![CDATA[
|
||
<?php
|
||
$str = "\xC4\xA2";
|
||
|
||
// 该字符串在所有三种编码中均有效,但返回的可能并不总是列表中的第一个编码
|
||
var_dump(mb_detect_encoding($str, ['UTF-8']));
|
||
var_dump(mb_detect_encoding($str, ['UTF-8', 'ISO-8859-1', 'ISO-8859-5'])); // 自 php8.1 起返回 ISO-8859-1 而不是 UTF-8
|
||
var_dump(mb_detect_encoding($str, ['ISO-8859-1', 'ISO-8859-5', 'UTF-8']));
|
||
var_dump(mb_detect_encoding($str, ['ISO-8859-5', 'UTF-8', 'ISO-8859-1']));
|
||
?>
|
||
]]>
|
||
</programlisting>
|
||
&example.outputs;
|
||
<screen>
|
||
<![CDATA[
|
||
string(5) "UTF-8"
|
||
string(10) "ISO-8859-1"
|
||
string(10) "ISO-8859-1"
|
||
string(10) "ISO-8859-5"
|
||
]]>
|
||
</screen>
|
||
</example>
|
||
</para>
|
||
</refsect1>
|
||
|
||
<refsect1 role="seealso">
|
||
&reftitle.seealso;
|
||
<para>
|
||
<simplelist>
|
||
<member><function>mb_detect_order</function></member>
|
||
</simplelist>
|
||
</para>
|
||
</refsect1>
|
||
|
||
</refentry>
|
||
<!-- Keep this comment at the end of the file
|
||
Local variables:
|
||
mode: sgml
|
||
sgml-omittag:t
|
||
sgml-shorttag:t
|
||
sgml-minimize-attributes:nil
|
||
sgml-always-quote-attributes:t
|
||
sgml-indent-step:1
|
||
sgml-indent-data:t
|
||
indent-tabs-mode:nil
|
||
sgml-parent-document:nil
|
||
sgml-default-dtd-file:"~/.phpdoc/manual.ced"
|
||
sgml-exposed-tags:nil
|
||
sgml-local-catalogs:nil
|
||
sgml-local-ecat-files:nil
|
||
End:
|
||
vim600: syn=xml fen fdm=syntax fdl=2 si
|
||
vim: et tw=78 syn=sgml
|
||
vi: ts=1 sw=1
|
||
-->
|