Multi-Byte String FunctionsMulti-Byte StringIntroduction
There are many languages in which all characters can be expressed
by single byte. Multi-byte character codes are used to express
many characters for many languages. mbstring
is developed to handle Japanese characters. However, many
mbstring functions are able to handle
character encoding other than Japanese.
A multi-byte character encoding represents single character with
consecutive bytes. Some character encoding has shift(escape)
sequences to start/end multi-byte character strings. Therefore, a
multi-byte character string may be destroyed when it is divided
and/or counted unless multi-byte character encoding safe method
is used. This module provides multi-byte character safe string
functions and other utility functions such as conversion
functions.
Since PHP is basically designed for ISO-8859-1, some multi-byte
character encoding does not work well with PHP. Therefore, it is
important to set mbstring.internal_encoding to
a character encoding that works with PHP.
PHP4 Character Encoding Requirements
Per byte encoding
Single byte characters in range of 00h-7fh
which is compatible with ASCII
Multi-byte characters without 00h-7fh
These are examples of internal character encoding that works with
PHP and does NOT work with PHP.
Character encoding, that does not work with PHP, may be converted
with mbstring's HTTP input/output conversion
feature/function.
SJIS should not be used for internal encoding unless the reader
is familiar with parser/compiler, character encoding and
character encoding issues.
If you use database with PHP, it is recommended that you use the
same character encoding for both database and internal
encoding for ease of use and better performance.
If you are using PostgreSQL, it supports character
encoding that is different from backend character encoding. See
the PostgreSQL manual for details.
How to Enable mbstringmbstring is an extended module. You must
enable module with configure script. Refer
to the Install section for
details.
The following configure options are related to
mbstring module.
: Enable
mbstring functions. This option is
required to use mbstring functions.
:
Enable HTTP input character encoding conversion using
mbstring conversion engine. If this
feature is enabled, HTTP input character encoding may be
converted to mbstring.internal_encoding
automatically.
HTTP Input and Output
HTTP input/output character encoding conversion may convert
binary data also. Users are supposed to control character
encoding conversion if binary data is used for HTTP
input/output.
If enctype for HTML form is set to
multipart/form-data,
mbstring does not convert character encoding
in POST data. If it is the case, strings are needed to be
converted to internal character encoding.
HTTP Input
There is no way to control HTTP input character
conversion from PHP script. To disable HTTP input character
conversion, it has to be done in &php.ini;.
Disable HTTP input conversion in &php.ini;
When using PHP as an Apache module, it is possible to
override PHP ini setting per Virtual Host in
httpd.conf or per directory with
.htaccess. Refer to the Configuration section and
Apache Manual for details.
HTTP Output
There are several ways to enable output character encoding
conversion. One is using &php.ini;, another
is using ob_start with
mb_output_handler as
ob_start callback function.
For PHP3-i18n users, mbstring's output
conversion differs from PHP3-i18n. Character encoding is
converted using output buffer.
&php.ini; setting exampleScript example
]]>
Supported Character Encoding
Currently, the following character encoding is supported by
mbstring module. Caracter encoding may
be specified for mbstring functions'
encoding parameter.
The following character encoding is supported in this PHP
extension :
UCS-4, UCS-4BE,
UCS-4LE, UCS-2,
UCS-2BE, UCS-2LE,
UTF-32, UTF-32BE,
UTF-32LE, UCS-2LE,
UTF-16, UTF-16BE,
UTF-16LE, UTF-8,
UTF-7, ASCII,
EUC-JP, SJIS,
eucJP-win, SJIS-win,
ISO-2022-JP, JIS,
ISO-8859-1, ISO-8859-2,
ISO-8859-3, ISO-8859-4,
ISO-8859-5, ISO-8859-6,
ISO-8859-7, ISO-8859-8,
ISO-8859-9, ISO-8859-10,
ISO-8859-13, ISO-8859-14,
ISO-8859-15, byte2be,
byte2le, byte4be,
byte4le, BASE64,
7bit, 8bit and
UTF7-IMAP.
&php.ini; entry, which accepts encoding name,
accepts "auto" and
"pass" also.
mbstring functions, which accepts encoding
name, and accepts "auto".
If "pass" is set, no character
encoding conversion is performed.
If "auto" is set, it is expanded to
"ASCII,JIS,UTF-8,EUC-JP,SJIS".
See also mb_detect_order
"Supported character encoding" does not mean that it
works as internal character code.
&php.ini; settingsmbstring.internal_encoding defines default
internal character encoding.
mbstring.http_input defines default HTTP
input character encoding.
mbstring.http_output defines default HTTP
output character encoding.
mbstring.detect_order defines default
character code detection order. See also
mb_detect_order.
mbstring.substitute_character defines
character to substitute for invalid character encoding.
Web Browsers are supposed to use the same character encoding
when submitting form. However, browsers may not use the same
character encoding. See mb_http_input to
detect character encoding used by browsers.
If enctype is set to
multipart/form-data in HTML forms,
mbstring does not convert character encoding
in POST data. The user must convert them in the script, if
conversion is needed.
Although, browsers are smart enough to detect character encoding
in HTML. charset is better to be set in HTTP
header. Change default_charset according to
character encoding.
&php.ini; setting example&php.ini; setting for EUC-JP users&php.ini; setting for SJIS users
Overload of PHP string functions by mbstring functions with
multibyte support
Because almost PHP application written for language using
single-byte character encoding, there are some difficulties for
multibyte string handling including japanese. Almost PHP string
functions such as substr do not support
multibyte string.
Multibyte extension (mbstring) has some PHP string functions
with multibyte support (ex. substr supports
mb_substr).
Multibyte extension (mbstring) also supports 'function
overloading' to add multibyte string functionality without
code modification. Using function overloading, some PHP string
functions will be oveloaded multibyte string functions.
For example, mb_substr is called
instead of substr if function overloading
is enabled. Function overload makes easy to port application
supporting only single-byte encoding for multibyte application.
mbstring.func_overload in &php.ini; should be
set some positive value to use function overloading.
The value should specify the category of overloading functions,
sbould be set 1 to enable mail function overloading. 2 to enable
string functions, 4 to regular expression functions. For
example, if is set for 7, mail, strings, regex functions should
be overloaded. The list of overloaded functions are shown in
below.
Functions to be overloadedvalue of mbstring.func_overloadoriginal functionoverloaded function1mailmb_send_mail2strlenmb_strlen2strposmb_strpos2strrposmb_strrpos2substrmb_substr4eregmb_ereg4eregimb_eregi4ereg_replacemb_ereg_replace4eregi_replacemb_eregi_replace4splitmb_split
Basics for Japanese multi-byte character
Most Japanese characters need more than 1 byte per character. In
addition, several character encoding schemas are used under a
Japanese environment. There are EUC-JP, Shift_JIS(SJIS) and
ISO-2022-JP(JIS) character encoding. As Unicode becomes popular,
UTF-8 is used also. To develop Web applications for a Japanese
environment, it is important to use the character set for the
task in hand, whether HTTP input/output, RDBMS and E-mail.
Storage for a character can be up to six
bytes
A multi-byte character is usually twice of the width compared
to single-byte characters. Wider characters are called
"zen-kaku" - meaning full width, narrower characters are
called "han-kaku" - meaning half width. "zen-kaku" characters
are usually fixed width.
Some character encoding defines shift(escape) sequence for
entering/exiting multi-byte character strings.
ISO-2022-JP must be used for SMTP/NNTP.
"i-mode" web site is supposed to use SJIS.
References
Multi-byte character encoding and its related issues are very
complex. It is impossible to cover in sufficient detail
here. Please refer to the following URLs and other resources for
further readings.
Unicode/UTF/UCS/etc
http://www.unicode.org/
Japanese/Korean/Chinese character
information
ftp://ftp.ora.com/pub/examples/nutshell/ujip/doc/cjk.inf
mb_language
Set/Get current language
Descriptionstringmb_languagestringlanguagemb_language sets language. If
language is omitted, it returns current
language as string.
language setting is used for encoding
e-mail messages. Valid languages are "Japanese",
"ja","English","en" and "uni"
(UTF-8). mb_send_mail uses this setting to
encode e-mail.
Language and its setting is ISO-2022-JP/Base64 for
Japanese, UTF-8/Base64 for uni, ISO-8859-1/quoted printable for
English.
Return Value: If language is set and
language is valid, it returns
&true;. Otherwise, it returns &false;. When
language is omitted, it returns language
name as string. If no language is set previously, it returns
&false;.
See also mb_send_mail.
mb_parse_str
Parse GET/POST/COOKIE data and set global variable
Descriptionbooleanmb_parse_strstringencoded_stringarrayresultmb_parse_str parses GET/POST/COOKIE data and
sets global variables. Since PHP does not provide raw POST/COOKIE
data, it can only used for GET data for now. It preses URL
encoded data, detects encoding, converts coding to internal
encoding and set values to result array or
global variables.
encoded_string: URL encoded data.
result: Array contains decoded and
character encoding converted values.
Return Value: It returns &true; for success or &false; for failure.
See also mb_detect_order,
mb_internal_encoding.
mb_internal_encoding
Set/Get internal character encoding
Descriptionstringmb_internal_encodingstringencodingmb_internal_encoding sets internal character
encoding to encoding If parameter is
omitted, it returns current internal encoding.
encoding is used for HTTP input character
encoding conversion, HTTP output character encoding conversion
and default character encoding for string functions defined by
mbstring module.
encoding: Character encoding name
Return Value: If encoding is
set,mb_internal_encoding returns
&true; for success, otherwise returns
&false;. If encoding is
omitted, it returns current character encoding name.
mb_internal_encoding example
See also mb_http_input,
mb_http_output,
mb_detect_order.
mb_http_inputDetect HTTP input character encodingDescriptionstringmb_http_inputstringtypemb_http_input returns result of HTTP input
character encoding detection.
type: Input string specifies input
type. "G" for GET, "P" for POST,
"C" for COOKIE. If type is omitted, it returns last
input type processed.
Return Value: Character encoding name.
If mb_http_input does not process specified
HTTP input, it returns &false;.
See also mb_internal_encoding,
mb_http_output,
mb_detect_order.
mb_http_outputSet/Get HTTP output character encodingDescriptionstringmb_http_outputstringencoding
If encoding is set,
mb_http_output sets HTTP output character
encoding to encoding. Output after this
function is converted to encoding.
mb_http_output returns
&true; for success and &false;
for failure.
If encoding is omitted,
mb_http_output returns current HTTP output
character encoding.
See also mb_internal_encoding,
mb_http_input,
mb_detect_order.
mb_detect_order
Set/Get character encoding detection order
Descriptionarraymb_detect_ordermixedencoding-listmb_detect_order sets automatic character
encoding detection order to encoding-list.
It returns &true; for success,
&false; for failure.
encoding-list is array or comma separated
list of character encoding. ("auto" is expanded to
"ASCII, JIS, UTF-8, EUC-JP, SJIS")
If encoding-list is omitted, it returns
current character encoding detection order as array.
This setting affects mb_detect_encoding and
mb_send_mail.
mbstring currently implements following
encoding detection filters. If there is a invalid byte sequence
for following encoding, encoding detection will fail.
UTF-8, UTF-7,
ASCII,
EUC-JP,SJIS,
eucJP-win, SJIS-win,
JIS, ISO-2022-JP
For ISO-8859-*, mbstring
always detects as ISO-8859-*.
For UTF-16, UTF-32,
UCS2 and UCS4, encoding
detection will fail always.
Useless detect order examplemb_detect_order examples
See also mb_internal_encoding,
mb_http_input,
mb_http_outputmb_send_mail.
mb_substitute_characterSet/Get substitution characterDescriptionmixedmb_substitute_charactermixedsubstrcharmb_substitute_character specifies
substitution character when input character encoding is invalid
or character code is not exist in output character
encoding. Invalid characters may be substituted &null;(no output),
string or integer value (Unicode character code value).
This setting affects mb_detect_encoding
and mb_send_mail.
substchar : Specify Unicode value as
integer or specify as string as follows
"none" : no output
"long" : Output character code value (Example:
U+3000,JIS+7E7E)
Return Value: If substchar is set, it
returns &true; for success, otherwise returns
&false;. If substchar is
not set, it returns Unicode value or
"none"/"long".
mb_substitute_character examplemb_output_handler
Callback function converts character encoding in output buffer
Descriptionstringmb_output_handlerstringcontentsintstatusmb_output_handler is
ob_start callback
function. mb_output_handler converts
characters in output buffer from internal character encoding to
HTTP output character encoding.
4.1.0 or later version, this hanlder adds charset HTTP header
when following conditions are met:
Does not set Content-Type by
header()Default MIME type begins with
text/http_output setting is other than
passcontents : Output buffer contents
status : Output buffer status
Return Value: String converted
mb_output_handler example
If you want to output some binary data such as image from PHP
script, you must set output encoding to "pass" using
mb_http_output.
See also ob_start.
mb_preferred_mime_nameGet MIME charset stringDescriptionstringmb_preferred_mime_namestringencodingmb_preferred_mime_name returns MIME
charset string for character encoding
encoding. It returns
charset string.
mb_preferred_mime_string examplemb_strlenGet string lengthDescriptionstringmb_strlenstringstrstringencodingmb_strlen returns number of characters in
string str having character encoding
encoding. A multi-byte character is
counted as 1.
encoding is character encoding for
str. If encoding is
omitted, internal character encoding is used.
See also mb_internal_encoding,
strlen.
mb_strpos
Find position of first occurrence of string in a string
Descriptionintmb_strposstringhaystackstringneedleintoffsetstringencodingmb_strpos returns the numeric position of
the first occurrence of needle in the
haystack string. If
needle is not found, it returns &false;.
mb_strpos performs multi-byte safe
strpos operation based on number of
characters. needle position is counted
from the beginning of the haystack. First
character's position is 0. Second character position is 1, and so
on.
If encoding is omitted, internal
character encoding is used. mb_strrpos
accepts string for
needle where strrpos
accepts only character.
offset is search offset. If it is not
specified, 0 is used.
encoding is character encoding name. If it
is omitted, internal character encoding is used.
See also mb_strpos,
mb_internal_encoding,
strposmb_strrpos
Find position of last occurrence of a string in a string
Descriptionintmb_strrposstringhaystackstringneedlestringencodingmb_strrpos returns the numeric position of
the last occurrence of needle in the
haystack string. If
needle is not found, it returns &false;.
mb_strrpos performs multi-byte safe
strrpos operation based on
number of characters. needle position is
counted from the beginning of
haystack. First character's position is
0. Second character position is 1.
If encoding is omitted, internal encoding
is assumed. mb_strrpos accepts
string for needle where
strrpos accepts only character.
encoding is character encoding. If it is
not specified, internal character encoding is used.
See also mb_strpos,
mb_internal_encoding,
strrpos.
mb_substrGet part of stringDescriptionstringmb_substrstringstrintstartintlengthstringencodingmb_substr returns the portion of
str specified by the
start and
length parameters.
mb_substr performs multi-byte safe
substr operation based on
number of characters. Position is
counted from the beginning of
str. First character's position is
0. Second character position is 1, and so on.
If encoding is omitted, internal encoding
is assumed.
encoding is character encoding. If it is
omitted, internal character encoding is used.
See also mb_strcut,
mb_internal_encoding.
mb_strcutGet part of stringDescriptionstringmb_strcutstringstrintstartintlengthstringencodingmb_strcut returns the portion of
str specified by the
start and
length parameters.
mb_strcut performs equivalent operation as
mb_substr with different method. If
start position is multi-byte character's
second byte or larger, it starts from first byte of multi-byte
character.
It subtracts string from str that is
shorter than length AND character that is
not part of multi-byte string or not being middle of shift
sequence.
encoding is character encoding. If it is
not set, internal character encoding is used.
See also mb_substr,
mb_internal_encoding.
mb_strwidthReturn width of stringDescriptionintmb_strwidthstringstrstringencodingmb_strwidth returns width of string
str.
Multi-byte character usually twice of width compare to single
byte character.
encoding is character encoding. If it is
omitted, internal encoding is used.
See also: mb_strimwidth,
mb_internal_encoding.
mb_strimwidthGet truncated string with specified widthDescriptionstringmb_strimwidthstringstrintstartintwidthstringtrimmarkerstringencodingmb_strimwidth truncates string
str to specified
width. It returns truncated string.
If trimmarker is set,
trimmarker is appended to return value.
start is start position offset. Number of
characters from the beginning of string. (First character is 0)
trimmarker is string that is added to the
end of string when string is truncated.
encoding is character encoding. If it is
omitted, internal encoding is used.
mb_strimwidth example
");
]]>
See also: mb_strwidth,
mb_internal_encoding.
mb_convert_encodingConvert character encodingDescriptionstringmb_convert_encodingstringstrstringto-encodingmixedfrom-encodingmb_convert_encoding converts
character encoding of string str from
from-encoding to
to-encoding.
str : String to be converted.
from-encoding is specified by character
code name before conversion. it can be array or string - comma
separated enumerated list. If it is not specified, the internal
encoding will be used.
mb_convert_encoding example
See also: mb_detect_order.
mb_detect_encodingDetect character encodingDescriptionstringmb_detect_encodingstringstrmixedencoding-listmb_detect_encoding detects character
encoding in string str. It returns
detected character encoding.
encoding-list is list of character
encoding. Encoding order may be specified by array or comma
separated list string.
If encoding_list is omitted,
detect_order is used.
mb_detect_encoding example
See also: mb_detect_order.
mb_convert_kana
Convert "kana" one from another ("zen-kaku" ,"han-kaku" and more)
Descriptionstringmb_convert_kanastringstrstringoptionmixedencodingmb_convert_kana performs "han-kaku" -
"zen-kaku" conversion for string str. It
returns converted string. This function is only useful for
Japanese.
option is conversion option. Default value
is "KV".
encoding is character encoding. If it is
omitted, internal character encoding is used.
U+0020)
"S" : Convert "han-kaku" space to "zen-kaku" (U+0020 -> U+3000)
"k" : Convert "zen-kaku kata-kana" to "han-kaku kata-kana"
"K" : Convert "han-kaku kata-kana" to "zen-kaku kata-kana"
"h" : Convert "zen-kaku hira-gana" to "han-kaku kata-kana"
"H" : Convert "han-kaku kata-kana" to "zen-kaku hira-gana"
"c" : Convert "zen-kaku kata-kana" to "zen-kaku hira-gana"
"C" : Convert "zen-kaku hira-gana" to "zen-kaku kata-kana"
"V" : Collapse voiced sound notation and convert them into a character. Use with "K","H"
]]>
mb_convert_kana examplemb_encode_mimeheaderEncode string for MIME headerDescriptionstringmb_encode_mimeheaderstringstrstringcharsetstringtransfer-encodingstringlinefeedmb_encode_mimeheader converts string
str to encoded-word for header field.
It returns converted string in ASCII encoding.
charset is character encoding
name. Default is ISO-2022-JP.
transfer-encoding is transfer encoding. It
should be one of "B" (Base64) or
"Q" (Quoted-Printable). Default is
"B".
linefeed is end of line marker. Default is
"\r\n" (CRLF).
mb_convert_kana example
";
echo $addr;
]]>
See also mb_decode_mimeheader.
mb_decode_mimeheaderDecode string in MIME header fieldDescriptionstringmb_decode_mimeheaderstringstrmb_decode_mimeheader decodes encoded-word
string str in MIME header.
It returns decoded string in internal character encoding.
See also mb_encode_mimeheader.
mb_convert_variablesConvert character code in variable(s)Descriptionstringmb_convert_variablesstringto-encodingmixedfrom-encodingmixedvarsmb_convert_variables convert
character encoding of variables vars in
encoding from-encoding to encoding
to-encoding. It returns character encoding
before conversion for success, &false; for failure.
mb_convert_variables join strings in Array
or Object to detect encoding, since encoding detection tends to
fail for short strings. Therefore, it is impossible to mix
encoding in single array or object.
It from-encoding is specified by
array or comma separated string, it tries to detect encoding from
from-coding. When
encoding is omitted,
detect_order is used.
vars (3rd and larger) is reference to
variable to be converted. String, Array and Object are accepted.
mb_convert_variables assumes all parameters
have the same encoding.
mb_convert_variables examplemb_encode_numericentity
Encode character to HTML numeric string reference
Descriptionstringmb_encode_numericentitystringstrarrayconvmapstringencodingmb_encode_numericentity converts
specified character codes in string str
from HTML numeric character reference to character code. It
returns converted string.
array is array specifies code area to
convert.
encoding is character encoding.
convmap examplemb_encode_numericentity example
See also: mb_decode_numericentity.
mb_decode_numericentity
Decode HTML numeric string reference to character
Descriptionstringmb_decode_numericentitystringstrarrayconvmapstringencoding
Convert numeric string reference of string
str in specified block to character. It
returns converted string.
array is array to specifies code area to
convert.
encoding is character encoding. If it is
omitted, internal character encoding is used.
convmap example
See also: mb_encode_numericentity.
mb_send_mail
Send encoded mail.
Descriptionbooleanmb_send_mailstringtostringsubjectstringmessagestringadditional_headersstringadditional_parametermb_send_mail sends email. Headers and
message are converted and encoded according to
mb_language setting.
mb_send_mail is wrapper
function of mail. See
mail for details.
to is mail addresses send to. Multiple
recipients can be specified by putting a comma between each
address in to. This parameter is not automatically encoded.
subject is subject of mail.
message is mail message.
additional_headers is inserted at
the end of the header. This is typically used to add extra
headers. Multiple extra headers are separated with a
newline ("\n").
additional_parameter is a MTA command line
parameter. It is useful when setting the correct Return-Path
header when using sendmail.
&return.success;
See also mail,
mb_encode_mimeheader, and
mb_language.
mb_get_infoGet internal settings of mbstringDescriptionstringmb_get_infostringtype
&warn.experimental.func;
mb_get_info returns internal setting
parameter of mbstring.
If type isn't specified or is specified to
"all", an array having the elements "internal_encoding",
"http_output", "http_input", "func_overload" will be returned.
If type is specified for "http_output",
"http_input", "internal_encoding", "func_overload",
the specified setting parameter will be returned.
See also mb_internal_encoding,
mb_http_output.
mb_regex_encoding
Returns current encoding for multibyte regex as string
Descriptionstringmb_regex_encodingstringencoding
&warn.experimental.func;
mb_regex_encoding returns the character
encoding used by multibyte regex functions.
If the optional parameter encoding is
specified, it is set to the character encoding for multibyte
regex. The default value is the internal character encoding.
This function is supported in PHP 4.2.0 or higher.
See also: mb_internal_encoding,
mb_eregmb_eregRegular expression match with multibyte supportDescriptionintmb_eregstringpatternstringstringarrayregs
&warn.experimental.func;
mb_ereg executes the regular expression
match with multibyte support, and returns 1 if matches are found.
If the optional third parameter was specified, the function
returns the byte length of matched part, and therarray
regs will contain the substring of matched
string. The functions returns 1 if it matches with the empty
string. It no matche found or error happend, &false; will be
returned.
The internal encoding or the character encoding specified in
mb_regex_encoding will be used as character
encoding.
This function is supported in PHP 4.2.0 or higher.
See also: mb_regex_encoding,
mb_eregimb_eregi
Regular expression match ignoring case with multibyte support
Descriptionintmb_eregistringpatternstringstringarrayregs
&warn.experimental.func;
mb_eregi executes the regular expression
match with multibyte support, and returns 1 if matches are found.
This function ignore case.
If the optional third parameter was specified, the function
returns the byte length of matched part, and therarray
regs will contain the substring of matched
string. The functions returns 1 if it matches with the empty
string. It no matche found or error happend, &false; will be
returned.
The internal encoding or the character encoding specified in
mb_regex_encoding will be used as character
encoding.
This function is supported in PHP 4.2.0 or higher.
See also: mb_regex_encoding,
mb_ereg.
mb_ereg_replaceReplace regular expression with multibyte supportDescriptionstringmb_ereg_replacestringpatternstringreplacementstringstringarrayoption
&warn.experimental.func;
mb_ereg_replace scans
string for matches to
pattern, then replaces the matched text
with replacement and returns the result
string or &false; on error. Multibyte character can be used in
pattern.
Matching condition can be set by option
parameter. If i is specified for this
parameter, the case will be ignored. If x is
specified, white space will be ignored. If m
is specified, match will be executed in multiline mode and line
break will be included in '.'. If p is
specified, match will be executed in POSIX mode, line break
will be considered as normal character. If e
is specified, replacement string will be
evaluated as PHP expression.
The internal encoding or the character encoding specified in
mb_regex_encoding will be used as character
encoding.
This function is supported in PHP 4.2.0 or higher.
See also: mb_regex_encoding,
mb_eregi_replace.
mb_eregi_replace
Replace regular expression with multibyte support
ignoring case
Descriptionstringmb_eregi_replacestringpatternstringreplacestringstring
&warn.experimental.func;
mb_ereg_replace scans
string for matches to
pattern, then replaces the matched text
with replacement and returns the result
string or &false; on error. Multibyte character can be used in
pattern. The case will be ignored.
The internal encoding or the character encoding specified in
mb_regex_encoding will be used as character
encoding.
This function is supported in PHP 4.2.0 or higher.
See also: mb_regex_encoding,
mb_ereg_replace.
mb_splitSplit multibyte string using regular expressionDescriptionarraymb_splitstringpatternstringstringintlimit
&warn.experimental.func;
mb_split split multibyte
string using regular expression
pattern and returns the result as an
array.
If optional parameter limit is specified,
it will be split in limit elements as
maximum.
The internal encoding or the character encoding specified in
mb_regex_encoding will be used as character
encoding.
This function is supported in PHP 4.2.0 or higher.
See also: mb_regex_encoding,
mb_ereg.
mb_ereg_match
Regular expression match for multibyte string
Descriptionboolmb_ereg_matchstringpatternstringstringstringoption
&warn.experimental.func;
mb_ereg_match returns &true; if
string matches regular expression
pattern, &false; if not.
The internal encoding or the character encoding specified in
mb_regex_encoding will be used as character
encoding.
This function is supported in PHP 4.2.0 or higher.
See also: mb_regex_encoding,
mb_ereg.
mb_ereg_search
Multibyte regular expression match for predefined multibyte string
Descriptionboolmb_ereg_searchstringpatternstringoption
&warn.experimental.func;
mb_ereg_search returns &true; if the
multibyte string matches with the regular expression, &false; for
otherwise. The string for matching is set by
mb_ereg_search_init. If
pattern is not specified, the previous one
is used.
The internal encoding or the character encoding specified in
mb_regex_encoding will be used as character
encoding.
This function is supported in PHP 4.2.0 or higher.
See also: mb_regex_encoding,
mb_ereg_search_init.
mb_ereg_search_pos
Return position and length of matched part of multibyte regular
expression for predefined multibyte string
Descriptionarraymb_ereg_search_posstringpatternstringoption
&warn.experimental.func;
mb_ereg_search_pos returns an array including
position of matched part for multibyte regular expression.
The first element of the array will be the beggining of matched
part, the second element will be length (bytes) of matched part.
It returns &false; on error.
The string for match is specified by
mb_ereg_search_init. It it is not specified,
the previous one will be used.
The internal encoding or the character encoding specified in
mb_regex_encoding will be used as character
encoding.
This function is supported in PHP 4.2.0 or higher.
See also: mb_regex_encoding,
mb_ereg_search_init.
mb_ereg_search_regs
Returns the matched part of multibyte regular expression
Descriptionarraymb_ereg_search_regsstringpatternstringoption
&warn.experimental.func;
mb_ereg_search_regs executes the multibyte
regular expression match, and if there are some matched part, it
returns an array including substring of matched part as first
element, the first grouped part with brackets as second element,
the second grouped part as third element, and so on. It returns
&false; on error.
The internal encoding or the character encoding specified in
mb_regex_encoding will be used as character
encoding.
This function is supported in PHP 4.2.0 or higher.
See also: mb_regex_encoding,
mb_ereg_search_init.
mb_ereg_search_init
Setup string and regular expression for multibyte regular
expression match
Descriptionarraymb_ereg_search_initstringstringstringpatternstringoption
&warn.experimental.func;
mb_ereg_search_init sets
string and pattern
for multibyte regular expression. These values are used for
mb_ereg_search,
mb_ereg_search_pos,
mb_ereg_search_regs. It returns &true; for
success, &false; for error.
The internal encoding or the character encoding specified in
mb_regex_encoding will be used as character
encoding.
This function is supported in PHP 4.2.0 or higher.
See also: mb_regex_encoding,
mb_ereg_search_regs.
mb_ereg_search_getregs
Retrive the result from the last multibyte regular expression
match
Descriptionarraymb_ereg_search_getregs
&warn.experimental.func;
mb_ereg_search_getregs returns an array
including the sub-string of matched part by last
mb_ereg_search,
mb_ereg_search_pos,
mb_ereg_search_regs. If there are some
maches, the first element will have the matched sub-string, the
second element will have the first part grouped with brackets,
the third element will have the second part grouped with
brackets, and so on. It returns &false; on error;
The internal encoding or the character encoding specified in
mb_regex_encoding will be used as character
encoding.
This function is supported in PHP 4.2.0 or higher.
See also: mb_regex_encoding,
mb_ereg_search_init.
mb_ereg_search_getpos
Returns start point for next regular expression match
Descriptionarraymb_ereg_search_getpos
&warn.experimental.func;
mb_ereg_search_getpos returns
the point to start regular expression match for
mb_ereg_search,
mb_ereg_search_pos,
mb_ereg_search_regs. The position is
represented by bytes from the head of string.
The internal encoding or the character encoding specified in
mb_regex_encoding will be used as character
encoding.
This function is supported in PHP 4.2.0 or higher.
See also: mb_regex_encoding,
mb_ereg_search_setpos.
mb_ereg_search_setpos
Set start point of next regular expression match
Descriptionarraymb_ereg_search_setpos
&warn.experimental.func;
mb_ereg_search_setpos sets the starting
point of match for mb_ereg_search.
The internal encoding or the character encoding specified in
mb_regex_encoding will be used as character
encoding.
This function is supported in PHP 4.2.0 or higher.
See also: mb_regex_encoding,
mb_ereg_search_init.