1
0
mirror of https://github.com/php/doc-zh.git synced 2026-03-26 08:02:16 +01:00
Files
archived-doc-zh/reference/mbstring/functions/mb-detect-encoding.xml

263 lines
8.2 KiB
XML
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
<?xml version="1.0" encoding="utf-8"?>
<!-- $Revision$ -->
<!-- EN-Revision: 7c4b5fb40ac3149a5b931f1e31b1050ab5eaab7e Maintainer: daijie Status: ready -->
<!-- CREDITS: mowangjuanzi -->
<refentry xml:id="function.mb-detect-encoding" xmlns="http://docbook.org/ns/docbook" xmlns:xlink="http://www.w3.org/1999/xlink">
<refnamediv>
<refname>mb_detect_encoding</refname>
<refpurpose>检测字符的编码</refpurpose>
</refnamediv>
<refsect1 role="description">
&reftitle.description;
<methodsynopsis>
<type class="union"><type>string</type><type>false</type></type><methodname>mb_detect_encoding</methodname>
<methodparam><type>string</type><parameter>string</parameter></methodparam>
<methodparam choice="opt"><type class="union"><type>array</type><type>string</type><type>null</type></type><parameter>encodings</parameter><initializer>&null;</initializer></methodparam>
<methodparam choice="opt"><type>bool</type><parameter>strict</parameter><initializer>&false;</initializer></methodparam>
</methodsynopsis>
<para>
从候选列表中检测 <type>string</type> <parameter>string</parameter> 最可能的字符编码。
</para>
<para>
自 PHP 8.1 起,此函数使用启发式检测指定列表中最有可能正确的有效文本编码,且可能不按照提供的 <parameter>encodings</parameter> 顺序进行检测。
</para>
<para>
对预期intended字符编码的自动检测不可能永远完全可靠没有额外的信息就类似于在没有密钥的情况下解码已编码的字符串。最好使用与数据一起存储或传输的字符编码表示例如“Content-Type” HTTP 头。
</para>
<para>
此函数适用于多字节编码,但并非所有字节顺序都构成有效字符串。如果输入字符串包含这样的顺序,则将会拒绝该编码。
</para>
<warning>
<title>结果不准确</title>
<para>
此函数的名称具有误导性,它执行的是“猜测”而不是“检测”。
</para>
<para>
猜测结果远非准确,因此无法使用此函数准确检测正确的字符编码。
</para>
</warning>
</refsect1>
<refsect1 role="parameters">
&reftitle.parameters;
<para>
<variablelist>
<varlistentry>
<term><parameter>string</parameter></term>
<listitem>
<para>
要检测的 <type>string</type>
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><parameter>encodings</parameter></term>
<listitem>
<para>
尝试的字符编码列表。该列表可以指定为字符串数组,或以逗号分隔的单个字符串。
</para>
<para>
如果省略 <parameter>encodings</parameter> 被或为 &null;,则将使用当前的 detect_order使用
<link linkend="ini.mbstring.detect-order">mbstring.detect_order</link> 配置选项或
<function>mb_detect_order</function> 函数设置)。
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><parameter>strict</parameter></term>
<listitem>
<para>
控制 <parameter>string</parameter> 在列出的所有 <parameter>encodings</parameter> 中无效时的行为。如果
<parameter>strict</parameter> 设置为 &false;,将返回最接近的匹配编码;如果 <parameter>strict</parameter>
设置为 &true;,将返回 &false;
</para>
<para>
可以使用 <link linkend="ini.mbstring.strict-detection">mbstring.strict_detection</link> 配置选项设置 <parameter>strict</parameter> 的默认值。
</para>
</listitem>
</varlistentry>
</variablelist>
</para>
</refsect1>
<refsect1 role="returnvalues">
&reftitle.returnvalues;
<para>
检测到的字符编码,如果字符串在任何列出的编码中均无效,则返回 &false;
</para>
</refsect1>
<refsect1 role="changelog">
&reftitle.changelog;
<informaltable>
<tgroup cols="2">
<thead>
<row>
<entry>&Version;</entry>
<entry>&Description;</entry>
</row>
</thead>
<tbody>
<row>
<entry>8.2.0</entry>
<entry>
<function>mb_detect_encoding</function>
将不再返回以下非文本编码:<literal>"Base64"</literal><literal>"QPrint"</literal><literal>"UUencode"</literal><literal>"HTML entities"</literal><literal>"7 bit"</literal><literal>"8 bit"</literal>
</entry>
</row>
</tbody>
</tgroup>
</informaltable>
</refsect1>
<refsect1 role="examples">
&reftitle.examples;
<para>
<example>
<title><function>mb_detect_encoding</function> 示例</title>
<programlisting role="php">
<![CDATA[
<?php
$str = "\x95\xB6\x8E\x9A\x83\x52\x81\x5B\x83\x68";
// 使用当前的 detect_order 来检测字符编码
var_dump(mb_detect_encoding($str));
// “auto”将根据 mbstring.language 来扩展
var_dump(mb_detect_encoding($str, "auto"));
// 通过以逗号分隔的列表指定“encodings”参数
var_dump(mb_detect_encoding($str, "JIS, eucjp-win, sjis-win"));
// 使用数组指定“encodings”参数
$encodings = [
"ASCII",
"JIS",
"EUC-JP"
];
var_dump(mb_detect_encoding($str, $encodings));
?>
]]>
</programlisting>
&example.outputs;
<screen>
<![CDATA[
string(5) "ASCII"
string(5) "ASCII"
string(8) "SJIS-win"
string(5) "ASCII"
]]>
</screen>
</example>
</para>
<para>
<example>
<title><parameter>strict</parameter> 参数的影响</title>
<programlisting role="php">
<![CDATA[
<?php
// 'áéóú' 在 ISO-8859-1 中的编码
$str = "\xE1\xE9\xF3\xFA";
// 该字符串不是有效的 ASCII 或 UTF-8但 UTF-8 被认为是更接近的匹配
var_dump(mb_detect_encoding($str, ['ASCII', 'UTF-8'], false));
var_dump(mb_detect_encoding($str, ['ASCII', 'UTF-8'], true));
// 如果找到有效编码,则严格参数不会更改结果
var_dump(mb_detect_encoding($str, ['ASCII', 'UTF-8', 'ISO-8859-1'], false));
var_dump(mb_detect_encoding($str, ['ASCII', 'UTF-8', 'ISO-8859-1'], true));
?>
]]>
</programlisting>
&example.outputs;
<screen>
<![CDATA[
string(5) "UTF-8"
bool(false)
string(10) "ISO-8859-1"
string(10) "ISO-8859-1"
]]>
</screen>
</example>
</para>
<para>
在某些情况下,相同的字节顺序可能会在多种字符编码中形成有效的字符串,并且无法知道其意图是哪种解释。例如,在众多字符编码中,字节序列“\xC4\xA2”可能是
</para>
<para>
<simplelist>
<member>
"Ä¢" (U+00C4 LATIN CAPITAL LETTER A WITH DIAERESIS followed by U+00A2 CENT SIGN)
encoded in any of ISO-8859-1, ISO-8859-15, or Windows-1252
</member>
<member>
"ФЂ" (U+0424 CYRILLIC CAPITAL LETTER EF followed by U+0402 CYRILLIC CAPITAL LETTER
DJE) encoded in ISO-8859-5
</member>
<member>
"Ģ" (U+0122 LATIN CAPITAL LETTER G WITH CEDILLA) encoded in UTF-8
</member>
</simplelist>
</para>
<para>
<example>
<title>匹配多个编码时顺序的影响</title>
<programlisting role="php">
<![CDATA[
<?php
$str = "\xC4\xA2";
// 该字符串在所有三种编码中均有效,但返回的可能并不总是列表中的第一个编码
var_dump(mb_detect_encoding($str, ['UTF-8']));
var_dump(mb_detect_encoding($str, ['UTF-8', 'ISO-8859-1', 'ISO-8859-5'])); // 自 php8.1 起返回 ISO-8859-1 而不是 UTF-8
var_dump(mb_detect_encoding($str, ['ISO-8859-1', 'ISO-8859-5', 'UTF-8']));
var_dump(mb_detect_encoding($str, ['ISO-8859-5', 'UTF-8', 'ISO-8859-1']));
?>
]]>
</programlisting>
&example.outputs;
<screen>
<![CDATA[
string(5) "UTF-8"
string(10) "ISO-8859-1"
string(10) "ISO-8859-1"
string(10) "ISO-8859-5"
]]>
</screen>
</example>
</para>
</refsect1>
<refsect1 role="seealso">
&reftitle.seealso;
<para>
<simplelist>
<member><function>mb_detect_order</function></member>
</simplelist>
</para>
</refsect1>
</refentry>
<!-- Keep this comment at the end of the file
Local variables:
mode: sgml
sgml-omittag:t
sgml-shorttag:t
sgml-minimize-attributes:nil
sgml-always-quote-attributes:t
sgml-indent-step:1
sgml-indent-data:t
indent-tabs-mode:nil
sgml-parent-document:nil
sgml-default-dtd-file:"~/.phpdoc/manual.ced"
sgml-exposed-tags:nil
sgml-local-catalogs:nil
sgml-local-ecat-files:nil
End:
vim600: syn=xml fen fdm=syntax fdl=2 si
vim: et tw=78 syn=sgml
vi: ts=1 sw=1
-->