Multi-Byte String Functions

Multi-Byte String Functions Multi-Byte String Introduction There are many languages in which all characters can be expressed by single byte. Multi-byte character codes are used to express many characters for many languages. mbstring is developed to handle Japanese characters. However, many mbstring functions are able to handle character encoding other than Japanese. A multi-byte character encoding represents single character with consecutive bytes. Some character encoding has shift(escape) sequences to start/end multi-byte character strings. Therefore, a multi-byte character string may be destroyed when it is divided and/or counted unless multi-byte character encoding safe method is used. This module provides multi-byte character safe string functions and other utility functions such as conversion functions. Since PHP is basically designed for ISO-8859-1, some multi-byte character encoding does not work well with PHP. Therefore, it is important to set mbstring.internal_encoding to a character encoding that works with PHP. PHP4 Character Encoding Requirements Per byte encoding Single byte characters in range of 00h-7fh which is compatible with ASCII Multi-byte characters without 00h-7fh These are examples of internal character encoding that works with PHP and does NOT work with PHP. Character encoding, that does not work with PHP, may be converted with mbstring's HTTP input/output conversion feature/function. SJIS should not be used for internal encoding unless the reader is familiar with parser/compiler, character encoding and character encoding issues. If you use database with PHP, it is recommended that you use the same character encoding for both database and internal encoding for ease of use and better performance. If you are using PostgreSQL, it supports character encoding that is different from backend character encoding. See the PostgreSQL manual for details. How to Enable mbstring mbstring is an extended module. You must enable module with configure script. Refer to the Install section for details. The following configure options are related to mbstring module. : Enable mbstring functions. This option is required to use mbstring functions. : Enable HTTP input character encoding conversion using mbstring conversion engine. If this feature is enabled, HTTP input character encoding may be converted to mbstring.internal_encoding automatically. HTTP Input and Output HTTP input/output character encoding conversion may convert binary data also. Users are supposed to control character encoding conversion if binary data is used for HTTP input/output. If enctype for HTML form is set to multipart/form-data, mbstring does not convert character encoding in POST data. If it is the case, strings are needed to be converted to internal character encoding. HTTP Input There is no way to control HTTP input character conversion from PHP script. To disable HTTP input character conversion, it has to be done in &php.ini;. Disable HTTP input conversion in &php.ini; When using PHP as an Apache module, it is possible to override PHP ini setting per Virtual Host in httpd.conf or per directory with .htaccess. Refer to the Configuration section and Apache Manual for details. HTTP Output There are several ways to enable output character encoding conversion. One is using &php.ini;, another is using ob_start with mb_output_handler as ob_start callback function. For PHP3-i18n users, mbstring's output conversion differs from PHP3-i18n. Character encoding is converted using output buffer. &php.ini; setting example Script example ]]> Supported Character Encoding Currently, the following character encoding is supported by mbstring module. Caracter encoding may be specified for mbstring functions' encoding parameter. The following character encoding is supported in this PHP extension : UCS-4, UCS-4BE, UCS-4LE, UCS-2, UCS-2BE, UCS-2LE, UTF-32, UTF-32BE, UTF-32LE, UCS-2LE, UTF-16, UTF-16BE, UTF-16LE, UTF-8, UTF-7, ASCII, EUC-JP, SJIS, eucJP-win, SJIS-win, ISO-2022-JP, JIS, ISO-8859-1, ISO-8859-2, ISO-8859-3, ISO-8859-4, ISO-8859-5, ISO-8859-6, ISO-8859-7, ISO-8859-8, ISO-8859-9, ISO-8859-10, ISO-8859-13, ISO-8859-14, ISO-8859-15, byte2be, byte2le, byte4be, byte4le, BASE64, 7bit, 8bit and UTF7-IMAP. &php.ini; entry, which accepts encoding name, accepts "auto" and "pass" also. mbstring functions, which accepts encoding name, and accepts "auto". If "pass" is set, no character encoding conversion is performed. If "auto" is set, it is expanded to "ASCII,JIS,UTF-8,EUC-JP,SJIS". See also mb_detect_order "Supported character encoding" does not mean that it works as internal character code. &php.ini; settings mbstring.internal_encoding defines default internal character encoding. mbstring.http_input defines default HTTP input character encoding. mbstring.http_output defines default HTTP output character encoding. mbstring.detect_order defines default character code detection order. See also mb_detect_order. mbstring.substitute_character defines character to substitute for invalid character encoding. Web Browsers are supposed to use the same character encoding when submitting form. However, browsers may not use the same character encoding. See mb_http_input to detect character encoding used by browsers. If enctype is set to multipart/form-data in HTML forms, mbstring does not convert character encoding in POST data. The user must convert them in the script, if conversion is needed. Although, browsers are smart enough to detect character encoding in HTML. charset is better to be set in HTTP header. Change default_charset according to character encoding. &php.ini; setting example &php.ini; setting for <literal>EUC-JP</literal> users &php.ini; setting for <literal>SJIS</literal> users Overload of PHP string functions by mbstring functions with multibyte support Because almost PHP application written for language using single-byte character encoding, there are some difficulties for multibyte string handling including japanese. Almost PHP string functions such as substr do not support multibyte string. Multibyte extension (mbstring) has some PHP string functions with multibyte support (ex. substr supports mb_substr). Multibyte extension (mbstring) also supports 'function overloading' to add multibyte string functionality without code modification. Using function overloading, some PHP string functions will be oveloaded multibyte string functions. For example, mb_substr is called instead of substr if function overloading is enabled. Function overload makes easy to port application supporting only single-byte encoding for multibyte application. mbstring.func_overload in &php.ini; should be set some positive value to use function overloading. The value should specify the category of overloading functions, sbould be set 1 to enable mail function overloading. 2 to enable string functions, 4 to regular expression functions. For example, if is set for 7, mail, strings, regex functions should be overloaded. The list of overloaded functions are shown in below. Functions to be overloaded value of mbstring.func_overload original function overloaded function 1 mail mb_send_mail 2 strlen mb_strlen 2 strpos mb_strpos 2 strrpos mb_strrpos 2 substr mb_substr 4 ereg mb_ereg 4 eregi mb_eregi 4 ereg_replace mb_ereg_replace 4 eregi_replace mb_eregi_replace 4 split mb_split

Basics for Japanese multi-byte character Most Japanese characters need more than 1 byte per character. In addition, several character encoding schemas are used under a Japanese environment. There are EUC-JP, Shift_JIS(SJIS) and ISO-2022-JP(JIS) character encoding. As Unicode becomes popular, UTF-8 is used also. To develop Web applications for a Japanese environment, it is important to use the character set for the task in hand, whether HTTP input/output, RDBMS and E-mail. Storage for a character can be up to six bytes A multi-byte character is usually twice of the width compared to single-byte characters. Wider characters are called "zen-kaku" - meaning full width, narrower characters are called "han-kaku" - meaning half width. "zen-kaku" characters are usually fixed width. Some character encoding defines shift(escape) sequence for entering/exiting multi-byte character strings. ISO-2022-JP must be used for SMTP/NNTP. "i-mode" web site is supposed to use SJIS. References Multi-byte character encoding and its related issues are very complex. It is impossible to cover in sufficient detail here. Please refer to the following URLs and other resources for further readings. Unicode/UTF/UCS/etc http://www.unicode.org/ Japanese/Korean/Chinese character information ftp://ftp.ora.com/pub/examples/nutshell/ujip/doc/cjk.inf mb_language Set/Get current language Description stringmb_language stringlanguage mb_language sets language. If language is omitted, it returns current language as string. language setting is used for encoding e-mail messages. Valid languages are "Japanese", "ja","English","en" and "uni" (UTF-8). mb_send_mail uses this setting to encode e-mail. Language and its setting is ISO-2022-JP/Base64 for Japanese, UTF-8/Base64 for uni, ISO-8859-1/quoted printable for English. Return Value: If language is set and language is valid, it returns &true;. Otherwise, it returns &false;. When language is omitted, it returns language name as string. If no language is set previously, it returns &false;. See also mb_send_mail. mb_parse_str Parse GET/POST/COOKIE data and set global variable Description booleanmb_parse_str stringencoded_string arrayresult mb_parse_str parses GET/POST/COOKIE data and sets global variables. Since PHP does not provide raw POST/COOKIE data, it can only used for GET data for now. It preses URL encoded data, detects encoding, converts coding to internal encoding and set values to result array or global variables. encoded_string: URL encoded data. result: Array contains decoded and character encoding converted values. Return Value: It returns &true; for success or &false; for failure. See also mb_detect_order, mb_internal_encoding. mb_internal_encoding Set/Get internal character encoding Description stringmb_internal_encoding stringencoding mb_internal_encoding sets internal character encoding to encoding If parameter is omitted, it returns current internal encoding. encoding is used for HTTP input character encoding conversion, HTTP output character encoding conversion and default character encoding for string functions defined by mbstring module. encoding: Character encoding name Return Value: If encoding is set,mb_internal_encoding returns &true; for success, otherwise returns &false;. If encoding is omitted, it returns current character encoding name. <function>mb_internal_encoding</function> example See also mb_http_input, mb_http_output, mb_detect_order. mb_http_input Detect HTTP input character encoding Description stringmb_http_input stringtype mb_http_input returns result of HTTP input character encoding detection. type: Input string specifies input type. "G" for GET, "P" for POST, "C" for COOKIE. If type is omitted, it returns last input type processed. Return Value: Character encoding name. If mb_http_input does not process specified HTTP input, it returns &false;. See also mb_internal_encoding, mb_http_output, mb_detect_order. mb_http_output Set/Get HTTP output character encoding Description stringmb_http_output stringencoding If encoding is set, mb_http_output sets HTTP output character encoding to encoding. Output after this function is converted to encoding. mb_http_output returns &true; for success and &false; for failure. If encoding is omitted, mb_http_output returns current HTTP output character encoding. See also mb_internal_encoding, mb_http_input, mb_detect_order. mb_detect_order Set/Get character encoding detection order Description arraymb_detect_order mixedencoding-list mb_detect_order sets automatic character encoding detection order to encoding-list. It returns &true; for success, &false; for failure. encoding-list is array or comma separated list of character encoding. ("auto" is expanded to "ASCII, JIS, UTF-8, EUC-JP, SJIS") If encoding-list is omitted, it returns current character encoding detection order as array. This setting affects mb_detect_encoding and mb_send_mail. mbstring currently implements following encoding detection filters. If there is a invalid byte sequence for following encoding, encoding detection will fail. UTF-8, UTF-7, ASCII, EUC-JP,SJIS, eucJP-win, SJIS-win, JIS, ISO-2022-JP For ISO-8859-*, mbstring always detects as ISO-8859-*. For UTF-16, UTF-32, UCS2 and UCS4, encoding detection will fail always. Useless detect order example <function>mb_detect_order</function> examples See also mb_internal_encoding, mb_http_input, mb_http_output mb_send_mail. mb_substitute_character Set/Get substitution character Description mixedmb_substitute_character mixedsubstrchar mb_substitute_character specifies substitution character when input character encoding is invalid or character code is not exist in output character encoding. Invalid characters may be substituted &null;(no output), string or integer value (Unicode character code value). This setting affects mb_detect_encoding and mb_send_mail. substchar : Specify Unicode value as integer or specify as string as follows "none" : no output "long" : Output character code value (Example: U+3000,JIS+7E7E) Return Value: If substchar is set, it returns &true; for success, otherwise returns &false;. If substchar is not set, it returns Unicode value or "none"/"long". <function>mb_substitute_character</function> example mb_output_handler Callback function converts character encoding in output buffer Description stringmb_output_handler stringcontents intstatus mb_output_handler is ob_start callback function. mb_output_handler converts characters in output buffer from internal character encoding to HTTP output character encoding. 4.1.0 or later version, this hanlder adds charset HTTP header when following conditions are met: Does not set Content-Type by header() Default MIME type begins with text/ http_output setting is other than pass contents : Output buffer contents status : Output buffer status Return Value: String converted <function>mb_output_handler</function> example If you want to output some binary data such as image from PHP script, you must set output encoding to "pass" using mb_http_output. See also ob_start. mb_preferred_mime_name Get MIME charset string Description stringmb_preferred_mime_name stringencoding mb_preferred_mime_name returns MIME charset string for character encoding encoding. It returns charset string. <function>mb_preferred_mime_string</function> example mb_strlen Get string length Description stringmb_strlen stringstr stringencoding mb_strlen returns number of characters in string str having character encoding encoding. A multi-byte character is counted as 1. encoding is character encoding for str. If encoding is omitted, internal character encoding is used. See also mb_internal_encoding, strlen. mb_strpos Find position of first occurrence of string in a string Description intmb_strpos stringhaystack stringneedle intoffset stringencoding mb_strpos returns the numeric position of the first occurrence of needle in the haystack string. If needle is not found, it returns &false;. mb_strpos performs multi-byte safe strpos operation based on number of characters. needle position is counted from the beginning of the haystack. First character's position is 0. Second character position is 1, and so on. If encoding is omitted, internal character encoding is used. mb_strrpos accepts string for needle where strrpos accepts only character. offset is search offset. If it is not specified, 0 is used. encoding is character encoding name. If it is omitted, internal character encoding is used. See also mb_strpos, mb_internal_encoding, strpos mb_strrpos Find position of last occurrence of a string in a string Description intmb_strrpos stringhaystack stringneedle stringencoding mb_strrpos returns the numeric position of the last occurrence of needle in the haystack string. If needle is not found, it returns &false;. mb_strrpos performs multi-byte safe strrpos operation based on number of characters. needle position is counted from the beginning of haystack. First character's position is 0. Second character position is 1. If encoding is omitted, internal encoding is assumed. mb_strrpos accepts string for needle where strrpos accepts only character. encoding is character encoding. If it is not specified, internal character encoding is used. See also mb_strpos, mb_internal_encoding, strrpos. mb_substr Get part of string Description stringmb_substr stringstr intstart intlength stringencoding mb_substr returns the portion of str specified by the start and length parameters. mb_substr performs multi-byte safe substr operation based on number of characters. Position is counted from the beginning of str. First character's position is 0. Second character position is 1, and so on. If encoding is omitted, internal encoding is assumed. encoding is character encoding. If it is omitted, internal character encoding is used. See also mb_strcut, mb_internal_encoding. mb_strcut Get part of string Description stringmb_strcut stringstr intstart intlength stringencoding mb_strcut returns the portion of str specified by the start and length parameters. mb_strcut performs equivalent operation as mb_substr with different method. If start position is multi-byte character's second byte or larger, it starts from first byte of multi-byte character. It subtracts string from str that is shorter than length AND character that is not part of multi-byte string or not being middle of shift sequence. encoding is character encoding. If it is not set, internal character encoding is used. See also mb_substr, mb_internal_encoding. mb_strwidth Return width of string Description intmb_strwidth stringstr stringencoding mb_strwidth returns width of string str. Multi-byte character usually twice of width compare to single byte character. encoding is character encoding. If it is omitted, internal encoding is used. See also: mb_strimwidth, mb_internal_encoding. mb_strimwidth Get truncated string with specified width Description stringmb_strimwidth stringstr intstart intwidth stringtrimmarker stringencoding mb_strimwidth truncates string str to specified width. It returns truncated string. If trimmarker is set, trimmarker is appended to return value. start is start position offset. Number of characters from the beginning of string. (First character is 0) trimmarker is string that is added to the end of string when string is truncated. encoding is character encoding. If it is omitted, internal encoding is used. <function>mb_strimwidth</function> example "); ]]> See also: mb_strwidth, mb_internal_encoding. mb_convert_encoding Convert character encoding Description stringmb_convert_encoding stringstr stringto-encoding mixedfrom-encoding mb_convert_encoding converts character encoding of string str from from-encoding to to-encoding. str : String to be converted. from-encoding is specified by character code name before conversion. it can be array or string - comma separated enumerated list. If it is not specified, the internal encoding will be used. <function>mb_convert_encoding</function> example See also: mb_detect_order. mb_detect_encoding Detect character encoding Description stringmb_detect_encoding stringstr mixedencoding-list mb_detect_encoding detects character encoding in string str. It returns detected character encoding. encoding-list is list of character encoding. Encoding order may be specified by array or comma separated list string. If encoding_list is omitted, detect_order is used. <function>mb_detect_encoding</function> example See also: mb_detect_order. mb_convert_kana Convert "kana" one from another ("zen-kaku" ,"han-kaku" and more) Description stringmb_convert_kana stringstr stringoption mixedencoding mb_convert_kana performs "han-kaku" - "zen-kaku" conversion for string str. It returns converted string. This function is only useful for Japanese. option is conversion option. Default value is "KV". encoding is character encoding. If it is omitted, internal character encoding is used. U+0020) "S" : Convert "han-kaku" space to "zen-kaku" (U+0020 -> U+3000) "k" : Convert "zen-kaku kata-kana" to "han-kaku kata-kana" "K" : Convert "han-kaku kata-kana" to "zen-kaku kata-kana" "h" : Convert "zen-kaku hira-gana" to "han-kaku kata-kana" "H" : Convert "han-kaku kata-kana" to "zen-kaku hira-gana" "c" : Convert "zen-kaku kata-kana" to "zen-kaku hira-gana" "C" : Convert "zen-kaku hira-gana" to "zen-kaku kata-kana" "V" : Collapse voiced sound notation and convert them into a character. Use with "K","H" ]]> <function>mb_convert_kana</function> example mb_encode_mimeheader Encode string for MIME header Description stringmb_encode_mimeheader stringstr stringcharset stringtransfer-encoding stringlinefeed mb_encode_mimeheader converts string str to encoded-word for header field. It returns converted string in ASCII encoding. charset is character encoding name. Default is ISO-2022-JP. transfer-encoding is transfer encoding. It should be one of "B" (Base64) or "Q" (Quoted-Printable). Default is "B". linefeed is end of line marker. Default is "\r\n" (CRLF). <function>mb_convert_kana</function> example "; echo $addr; ]]> See also mb_decode_mimeheader. mb_decode_mimeheader Decode string in MIME header field Description stringmb_decode_mimeheader stringstr mb_decode_mimeheader decodes encoded-word string str in MIME header. It returns decoded string in internal character encoding. See also mb_encode_mimeheader. mb_convert_variables Convert character code in variable(s) Description stringmb_convert_variables stringto-encoding mixedfrom-encoding mixedvars mb_convert_variables convert character encoding of variables vars in encoding from-encoding to encoding to-encoding. It returns character encoding before conversion for success, &false; for failure. mb_convert_variables join strings in Array or Object to detect encoding, since encoding detection tends to fail for short strings. Therefore, it is impossible to mix encoding in single array or object. It from-encoding is specified by array or comma separated string, it tries to detect encoding from from-coding. When encoding is omitted, detect_order is used. vars (3rd and larger) is reference to variable to be converted. String, Array and Object are accepted. mb_convert_variables assumes all parameters have the same encoding. <function>mb_convert_variables</function> example mb_encode_numericentity Encode character to HTML numeric string reference Description stringmb_encode_numericentity stringstr arrayconvmap stringencoding mb_encode_numericentity converts specified character codes in string str from HTML numeric character reference to character code. It returns converted string. array is array specifies code area to convert. encoding is character encoding. <parameter>convmap</parameter> example <function>mb_encode_numericentity</function> example See also: mb_decode_numericentity. mb_decode_numericentity Decode HTML numeric string reference to character Description stringmb_decode_numericentity stringstr arrayconvmap stringencoding Convert numeric string reference of string str in specified block to character. It returns converted string. array is array to specifies code area to convert. encoding is character encoding. If it is omitted, internal character encoding is used. <parameter>convmap</parameter> example See also: mb_encode_numericentity. mb_send_mail Send encoded mail. Description booleanmb_send_mail stringto stringsubject stringmessage stringadditional_headers stringadditional_parameter mb_send_mail sends email. Headers and message are converted and encoded according to mb_language setting. mb_send_mail is wrapper function of mail. See mail for details. to is mail addresses send to. Multiple recipients can be specified by putting a comma between each address in to. This parameter is not automatically encoded. subject is subject of mail. message is mail message. additional_headers is inserted at the end of the header. This is typically used to add extra headers. Multiple extra headers are separated with a newline ("\n"). additional_parameter is a MTA command line parameter. It is useful when setting the correct Return-Path header when using sendmail. &return.success; See also mail, mb_encode_mimeheader, and mb_language. mb_get_info Get internal settings of mbstring Description stringmb_get_info stringtype &warn.experimental.func; mb_get_info returns internal setting parameter of mbstring. If type isn't specified or is specified to "all", an array having the elements "internal_encoding", "http_output", "http_input", "func_overload" will be returned. If type is specified for "http_output", "http_input", "internal_encoding", "func_overload", the specified setting parameter will be returned. See also mb_internal_encoding, mb_http_output. mb_regex_encoding Returns current encoding for multibyte regex as string Description stringmb_regex_encoding stringencoding &warn.experimental.func; mb_regex_encoding returns the character encoding used by multibyte regex functions. If the optional parameter encoding is specified, it is set to the character encoding for multibyte regex. The default value is the internal character encoding. This function is supported in PHP 4.2.0 or higher. See also: mb_internal_encoding, mb_ereg mb_ereg Regular expression match with multibyte support Description intmb_ereg stringpattern stringstring arrayregs &warn.experimental.func; mb_ereg executes the regular expression match with multibyte support, and returns 1 if matches are found. If the optional third parameter was specified, the function returns the byte length of matched part, and therarray regs will contain the substring of matched string. The functions returns 1 if it matches with the empty string. It no matche found or error happend, &false; will be returned. The internal encoding or the character encoding specified in mb_regex_encoding will be used as character encoding. This function is supported in PHP 4.2.0 or higher. See also: mb_regex_encoding, mb_eregi mb_eregi Regular expression match ignoring case with multibyte support Description intmb_eregi stringpattern stringstring arrayregs &warn.experimental.func; mb_eregi executes the regular expression match with multibyte support, and returns 1 if matches are found. This function ignore case. If the optional third parameter was specified, the function returns the byte length of matched part, and therarray regs will contain the substring of matched string. The functions returns 1 if it matches with the empty string. It no matche found or error happend, &false; will be returned. The internal encoding or the character encoding specified in mb_regex_encoding will be used as character encoding. This function is supported in PHP 4.2.0 or higher. See also: mb_regex_encoding, mb_ereg. mb_ereg_replace Replace regular expression with multibyte support Description stringmb_ereg_replace stringpattern stringreplacement stringstring arrayoption &warn.experimental.func; mb_ereg_replace scans string for matches to pattern, then replaces the matched text with replacement and returns the result string or &false; on error. Multibyte character can be used in pattern. Matching condition can be set by option parameter. If i is specified for this parameter, the case will be ignored. If x is specified, white space will be ignored. If m is specified, match will be executed in multiline mode and line break will be included in '.'. If p is specified, match will be executed in POSIX mode, line break will be considered as normal character. If e is specified, replacement string will be evaluated as PHP expression. The internal encoding or the character encoding specified in mb_regex_encoding will be used as character encoding. This function is supported in PHP 4.2.0 or higher. See also: mb_regex_encoding, mb_eregi_replace. mb_eregi_replace Replace regular expression with multibyte support ignoring case Description stringmb_eregi_replace stringpattern stringreplace stringstring &warn.experimental.func; mb_ereg_replace scans string for matches to pattern, then replaces the matched text with replacement and returns the result string or &false; on error. Multibyte character can be used in pattern. The case will be ignored. The internal encoding or the character encoding specified in mb_regex_encoding will be used as character encoding. This function is supported in PHP 4.2.0 or higher. See also: mb_regex_encoding, mb_ereg_replace. mb_split Split multibyte string using regular expression Description arraymb_split stringpattern stringstring intlimit &warn.experimental.func; mb_split split multibyte string using regular expression pattern and returns the result as an array. If optional parameter limit is specified, it will be split in limit elements as maximum. The internal encoding or the character encoding specified in mb_regex_encoding will be used as character encoding. This function is supported in PHP 4.2.0 or higher. See also: mb_regex_encoding, mb_ereg. mb_ereg_match Regular expression match for multibyte string Description boolmb_ereg_match stringpattern stringstring stringoption &warn.experimental.func; mb_ereg_match returns &true; if string matches regular expression pattern, &false; if not. The internal encoding or the character encoding specified in mb_regex_encoding will be used as character encoding. This function is supported in PHP 4.2.0 or higher. See also: mb_regex_encoding, mb_ereg. mb_ereg_search Multibyte regular expression match for predefined multibyte string Description boolmb_ereg_search stringpattern stringoption &warn.experimental.func; mb_ereg_search returns &true; if the multibyte string matches with the regular expression, &false; for otherwise. The string for matching is set by mb_ereg_search_init. If pattern is not specified, the previous one is used. The internal encoding or the character encoding specified in mb_regex_encoding will be used as character encoding. This function is supported in PHP 4.2.0 or higher. See also: mb_regex_encoding, mb_ereg_search_init. mb_ereg_search_pos Return position and length of matched part of multibyte regular expression for predefined multibyte string Description arraymb_ereg_search_pos stringpattern stringoption &warn.experimental.func; mb_ereg_search_pos returns an array including position of matched part for multibyte regular expression. The first element of the array will be the beggining of matched part, the second element will be length (bytes) of matched part. It returns &false; on error. The string for match is specified by mb_ereg_search_init. It it is not specified, the previous one will be used. The internal encoding or the character encoding specified in mb_regex_encoding will be used as character encoding. This function is supported in PHP 4.2.0 or higher. See also: mb_regex_encoding, mb_ereg_search_init. mb_ereg_search_regs Returns the matched part of multibyte regular expression Description arraymb_ereg_search_regs stringpattern stringoption &warn.experimental.func; mb_ereg_search_regs executes the multibyte regular expression match, and if there are some matched part, it returns an array including substring of matched part as first element, the first grouped part with brackets as second element, the second grouped part as third element, and so on. It returns &false; on error. The internal encoding or the character encoding specified in mb_regex_encoding will be used as character encoding. This function is supported in PHP 4.2.0 or higher. See also: mb_regex_encoding, mb_ereg_search_init. mb_ereg_search_init Setup string and regular expression for multibyte regular expression match Description arraymb_ereg_search_init stringstring stringpattern stringoption &warn.experimental.func; mb_ereg_search_init sets string and pattern for multibyte regular expression. These values are used for mb_ereg_search, mb_ereg_search_pos, mb_ereg_search_regs. It returns &true; for success, &false; for error. The internal encoding or the character encoding specified in mb_regex_encoding will be used as character encoding. This function is supported in PHP 4.2.0 or higher. See also: mb_regex_encoding, mb_ereg_search_regs. mb_ereg_search_getregs Retrive the result from the last multibyte regular expression match Description arraymb_ereg_search_getregs &warn.experimental.func; mb_ereg_search_getregs returns an array including the sub-string of matched part by last mb_ereg_search, mb_ereg_search_pos, mb_ereg_search_regs. If there are some maches, the first element will have the matched sub-string, the second element will have the first part grouped with brackets, the third element will have the second part grouped with brackets, and so on. It returns &false; on error; The internal encoding or the character encoding specified in mb_regex_encoding will be used as character encoding. This function is supported in PHP 4.2.0 or higher. See also: mb_regex_encoding, mb_ereg_search_init. mb_ereg_search_getpos Returns start point for next regular expression match Description arraymb_ereg_search_getpos &warn.experimental.func; mb_ereg_search_getpos returns the point to start regular expression match for mb_ereg_search, mb_ereg_search_pos, mb_ereg_search_regs. The position is represented by bytes from the head of string. The internal encoding or the character encoding specified in mb_regex_encoding will be used as character encoding. This function is supported in PHP 4.2.0 or higher. See also: mb_regex_encoding, mb_ereg_search_setpos. mb_ereg_search_setpos Set start point of next regular expression match Description arraymb_ereg_search_setpos &warn.experimental.func; mb_ereg_search_setpos sets the starting point of match for mb_ereg_search. The internal encoding or the character encoding specified in mb_regex_encoding will be used as character encoding. This function is supported in PHP 4.2.0 or higher. See also: mb_regex_encoding, mb_ereg_search_init.