-
Notifications
You must be signed in to change notification settings - Fork 745
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix multiple operator rule in PHP lexer #1478
Conversation
Is it a problem if operators like |
@julp Not really. The issue isn't about the number of tokens that are output; Rouge will merge the identical contiguous tokens together in the final output. The issue is the number of characters that the regular expression matches. The earlier rule was matching one or more operator symbols but since these symbols included |
In fact, only |
76d1958
to
60b95c6
Compare
@julp Good point about the |
Yes unless you want to handle this with a negative assertion? (eg |
@julp Agreed. Will merge this in now. |
@julp Thanks for your help! Another bug fixed! :) |
The PHP lexer has a rule that can match one or more characters as operators. The problem is that if an operator occurs immediately before the `?>` that closes a block of PHP, the `?` will match as an operator. This is possible with the `:` operator. This commit fixes the problem by splitting `?` out into its own rule.
PHP has a rule that can match one or more symbols as operators. Although this permits combinations that are not syntactically correct, it provides some performance improvement for code that has sequential operators.
The problem is that two of these operators,
?>
, have a special meaning in PHP. Matching multiple operators in one pass causes the?>
to be lexed incorrectly if it is immediately preceded by an operator. This is possible with the:
operator. This PR fixes the rule by splitting it between operators that can appear multiple times in sequence and those that cannot.This fixes #1362.