Always simply ignore (pass through) them. Previously the behavior
depended on where the invalid character occurred, as it messed
up the state management.
This patch adds missing newlines, trims multiple redundant final
newlines into a single one, and trims redundant leading newlines in all
*.phpt sections.
According to POSIX, a line is a sequence of zero or more non-' <newline>'
characters plus a terminating '<newline>' character. [1] Files should
normally have at least one final newline character.
C89 [2] and later standards [3] mention a final newline:
"A source file that is not empty shall end in a new-line character,
which shall not be immediately preceded by a backslash character."
Although it is not mandatory for all files to have a final newline
fixed, a more consistent and homogeneous approach brings less of commit
differences issues and a better development experience in certain text
editors and IDEs.
[1] http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap03.html#tag_03_206
[2] https://port70.net/~nsz/c/c89/c89-draft.html#2.1.1.2
[3] https://port70.net/~nsz/c/c99/n1256.html#5.1.1.2
Say the string is \377\000, basename will use mbrlen() to check whether
it's a start of a multibyte sequence. While on Linux it'll return -1 for
any char in the extended ASCII, on Windows it's returning 1. From what I
see the reason is that Windows doesn't implement UTF-8 in the CRT lib,
it's rather 16-bit Unicode or DBCS. Since extended ASCII is convertable
to Unicode directly - thus the behavior. On Linux however, it's a true
UTF-8 locale and implementation, for it \377\000 is invalid.
Maybe mbrlen needs an independent implementation for Windows supporting
UTF-8. For now I just split out this case so the most of the big basename
test doesn't fail on this one case.