To cater to potentially state-dependent encodings, we have to reset the
conversion descriptor into its initial shift state to properly finish
the conversion. Furthermore, state-dependent encodings may not show
progress when comparing `in_left` before and after the conversion; we
rather have to see whether `out_left` has decreased. Also we have to
cater to the fact that the final potentially state resetting call does
not signal failure, but we still have to break respective loops
afterwards.
If the `ICONV_MIME_DECODE_CONTINUE_ON_ERROR` flag is set, parsing
should not fail, if there are illegal characters in the headers;
instead we silently ignore these like before.
This patch adds missing newlines, trims multiple redundant final
newlines into a single one, and trims redundant leading newlines in all
*.phpt sections.
According to POSIX, a line is a sequence of zero or more non-' <newline>'
characters plus a terminating '<newline>' character. [1] Files should
normally have at least one final newline character.
C89 [2] and later standards [3] mention a final newline:
"A source file that is not empty shall end in a new-line character,
which shall not be immediately preceded by a backslash character."
Although it is not mandatory for all files to have a final newline
fixed, a more consistent and homogeneous approach brings less of commit
differences issues and a better development experience in certain text
editors and IDEs.
[1] http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap03.html#tag_03_206
[2] https://port70.net/~nsz/c/c89/c89-draft.html#2.1.1.2
[3] https://port70.net/~nsz/c/c99/n1256.html#5.1.1.2
This patch adds missing newlines, trims multiple redundant final
newlines into a single one, and trims redundant leading newlines in all
*.phpt sections.
According to POSIX, a line is a sequence of zero or more non-' <newline>'
characters plus a terminating '<newline>' character. [1] Files should
normally have at least one final newline character.
C89 [2] and later standards [3] mention a final newline:
"A source file that is not empty shall end in a new-line character,
which shall not be immediately preceded by a backslash character."
Although it is not mandatory for all files to have a final newline
fixed, a more consistent and homogeneous approach brings less of commit
differences issues and a better development experience in certain text
editors and IDEs.
[1] http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap03.html#tag_03_206
[2] https://port70.net/~nsz/c/c89/c89-draft.html#2.1.1.2
[3] https://port70.net/~nsz/c/c99/n1256.html#5.1.1.2
This patch adds missing newlines, trims multiple redundant final
newlines into a single one, and trims redundant leading newlines in all
*.phpt sections.
According to POSIX, a line is a sequence of zero or more non-' <newline>'
characters plus a terminating '<newline>' character. [1] Files should
normally have at least one final newline character.
C89 [2] and later standards [3] mention a final newline:
"A source file that is not empty shall end in a new-line character,
which shall not be immediately preceded by a backslash character."
Although it is not mandatory for all files to have a final newline
fixed, a more consistent and homogeneous approach brings less of commit
differences issues and a better development experience in certain text
editors and IDEs.
[1] http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap03.html#tag_03_206
[2] https://port70.net/~nsz/c/c89/c89-draft.html#2.1.1.2
[3] https://port70.net/~nsz/c/c99/n1256.html#5.1.1.2
This patch adds missing newlines, trims multiple redundant final
newlines into a single one, and trims redundant leading newlines.
According to POSIX, a line is a sequence of zero or more non-' <newline>'
characters plus a terminating '<newline>' character. [1] Files should
normally have at least one final newline character.
C89 [2] and later standards [3] mention a final newline:
"A source file that is not empty shall end in a new-line character,
which shall not be immediately preceded by a backslash character."
Although it is not mandatory for all files to have a final newline
fixed, a more consistent and homogeneous approach brings less of commit
differences issues and a better development experience in certain text
editors and IDEs.
[1] http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap03.html#tag_03_206
[2] https://port70.net/~nsz/c/c89/c89-draft.html#2.1.1.2
[3] https://port70.net/~nsz/c/c99/n1256.html#5.1.1.2
This patch adds missing newlines, trims multiple redundant final
newlines into a single one, and trims redundant leading newlines.
According to POSIX, a line is a sequence of zero or more non-' <newline>'
characters plus a terminating '<newline>' character. [1] Files should
normally have at least one final newline character.
C89 [2] and later standards [3] mention a final newline:
"A source file that is not empty shall end in a new-line character,
which shall not be immediately preceded by a backslash character."
Although it is not mandatory for all files to have a final newline
fixed, a more consistent and homogeneous approach brings less of commit
differences issues and a better development experience in certain text
editors and IDEs.
[1] http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap03.html#tag_03_206
[2] https://port70.net/~nsz/c/c89/c89-draft.html#2.1.1.2
[3] https://port70.net/~nsz/c/c99/n1256.html#5.1.1.2
This patch adds missing newlines, trims multiple redundant final
newlines into a single one, and trims redundant leading newlines.
According to POSIX, a line is a sequence of zero or more non-' <newline>'
characters plus a terminating '<newline>' character. [1] Files should
normally have at least one final newline character.
C89 [2] and later standards [3] mention a final newline:
"A source file that is not empty shall end in a new-line character,
which shall not be immediately preceded by a backslash character."
Although it is not mandatory for all files to have a final newline
fixed, a more consistent and homogeneous approach brings less of commit
differences issues and a better development experience in certain text
editors and IDEs.
[1] http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap03.html#tag_03_206
[2] https://port70.net/~nsz/c/c89/c89-draft.html#2.1.1.2
[3] https://port70.net/~nsz/c/c99/n1256.html#5.1.1.2
Before the fix for bug 48289 has been applied, the algorithm to
construct a Q-encoded-word has been optimistic, i.e. try to encode as
many bytes that *may* fit in the remaining space, calculate the actual
length of the Q-encoded word, and if it's too long, try again with a
reduced size. However, the fix for the mentioned bug replaced this by
a pessimistic algorithm, which always terminates[1] the for loop[2]
during the first iteration (which renders the following 3 lines as dead
code), and as such easily produces unnecessarily short encoded-words.
Instead the proper fix for the bug would have been to make sure that
`out_size` is always decremented, if the space isn't sufficient for the
encoded-word.
[1] <https://github.com/php/php-src/blob/php-7.3.0beta3/ext/iconv/iconv.c#L1421>
[2] <https://github.com/php/php-src/blob/php-7.3.0beta3/ext/iconv/iconv.c#L1360>
Basically, the algorithm to append a converted string to an existing
`smart_str` works by increasing the `smart_str` buffer, to let `iconv`
convert characters until there is no more space, to set the new length
of the `smart_str` and to repeat until there is no more input.
Formerly, the new length calculation has been wrong, though, since we
would have to take the old `out_len` into account (`buf_growth -
old_out_len - out_len`). However, since there is no need to take the
old `out_len` into account when increasing the `smart_str` buffer, we
can simplify the fix, avoiding an additional variable.
We must not ignore erroneous characters in mime headers, but rather let
iconv_mime_decode() fail in this case, issuing the usual notice
regarding illegal characters.
We have to cater to the possibility that `=?` is not the start of an
encoded-word, but rather a literal `=?`. If a line break is found
while we're still looking for the charset, we can safely assume that
it's a literal `=?`, and act accordingly.
If we're expecting the start of an encoded word (`=?`), but instead of
the question mark get a line break (CR or LF), we must not append it to
the `pretval`.
The minimum length of an encoded-word is actually the pure encoding
overhead plus the length of the `output-charset` plus the minimum unit
of encoded text, which is 4 for B-encoding and (for simplicity) 3 for
Q-encoding. We also cater to the possibility that we need further
encoded words, which would be split by the `line-break-chars` followed
by a space character. Obviously, the former `out_charset_len + 12` is
too simplistic and wrong in the given case (where the magic number
would be 13).
These simplifications are somewhat wasteful, but iconv_mime_encode()
with Q-encoding is wasteful anyway (see bug 66828[1]), and the proper
solution to convert the whole input to the desired output charset
upfront, and applying the encoding afterwards appears too much a change
for the stable releases.
[1] <https://bugs.php.net/66828>
PHP requires integer typehints to be written "int" and does not
allow "integer" as an alias. This changes type error messages to
match the actual type name and avoids confusing messages like
"must be of the type integer, integer given".