Aside from a few very specific syntax errors for which detailed exceptions are
thrown, generally PHP just emits the default error messages generated by bison on syntax
error. These messages are very uninformative; they just say "Unexpected ... at line ...".
This is most problematic with constructs which can span an arbitrary number of lines, such
as blocks of code delimited by { }, 'if' conditions delimited by ( ), and so on. If a closing
delimiter is missed, the block will run for the entire remainder of the source file (which
could be thousands of lines), and then at the end, a parse error will be thrown with the
dreaded words: "Unexpected end of file".
Therefore, track the positions of opening and closing delimiters and ensure that they match
up correctly. If any mismatch or missing delimiter is detected, immediately throw a parse
error which points the user to the offending line. This is best done in the *lexer* and not
in the parser.
Thanks to Nikita Popov and George Peter Banyard for suggesting improvements.
Fixes bug #79368.
Closes GH-5364.
Closes GH-5353. From now on, PHP will have reflection information
about default values of parameters of internal functions.
Co-authored-by: Nikita Popov <nikita.ppv@gmail.com>
Around a quarter of all strings in array tokens would have a string that's one
character long (e.g. ` `, `\`, `1`)
For parsing a large number of php files,
The memory increase dropped from 378374248 to 369535688 (2.5%)
Closes GH-4753.
To clean up the mess here a bit, check for invalid octal digits
with an explicit loop instead of mixing this into the string to
number conversion.
Also clean up some type usage.
Changes:
- executable from any location (for example, project root)
- some minor common shell scripts CS fixes
- error reporting done based on the presence of the parser file
Normalization include:
- Use dnl for everything that can be ommitted when configure is built in
favor of the shell comment character # which is visible in the output.
- Line length normalized to 80 columns
- Dots for most of the one line sentences
- Macro definitions include similar pattern header comments now
This reverts commit e528762c1c.
Dmitry reports that this has a non-trivial impact on parsing
overhead, especially on 32-bit systems. As we don't have a strong
need for this change right now, I'm reverting it.
See also comments on
e528762c1c.
Locations for AST nodes are now tracked with the help of bison
location tracking. This is more accurate than what we currently do
and easier to extend with more information.
A zend_ast_loc structure is introduced, which is used for the location
stack. Currently it only holds the start lineno, but can be extended
to also hold end lineno and offset/column information in the future.
All AST constructors now accept a zend_ast_loc* as first argument, and
will use it to determine their lineno. Previously this used either the
CG(zend_lineno), or the smallest AST lineno of child nodes.
On the parser side, the location structure for a whole rule can be
obtained using the &@$ character salad.