-
Notifications
You must be signed in to change notification settings - Fork 162
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stricter string escape #619
Conversation
With this patch it is possible to give character values by using the common '\xHH' escape where H is a hexadecimal digit.
While we're here, we also make the scanning of escape sequences stricter.
This was before parsed as "+", so there should not be a functional change.
I guess writing " |
@markuspf wrote:
No problem for GAPDoc, I have removed the four I have commented on a change which I think is wrong. It seems you have missed two cases: In lines 291 and 292 in Note that the escapes Otherwise it looks ok to me. |
1d0e1c5
to
66a6b61
Compare
@frankluebeck Hmm, what do you mean "no problem for GAPDoc" -- quite clearly, with this patch, GAPDoc 1.5.1 triggers syntax errors. Or did you mean that these are easy to fix, and that in the next GAPDoc version, it won't be a problem? (Still looking forward to that new release, by the way ;-) |
There are errors in Hmm, could we perhaps do this: Keep accepting those "invalid" sequences (treating them as before), but printing a syntax warning. Then, merge that. After that, urge authors of affected packages to fix that (might be a problem for qaos, as that was last updated in 2008). Then eventually, turn it into a syntax error. |
Some more potentially affected packages / files (from some grepping through my pkg directory; this is probably incomplete):
|
Rethinking this, with the discussed examples in mind, I have changed my opinion. I have looked into the documentation of some other languages with backslash escapes. In all cases it was not documented how a not explicitly mentioned backslash-something sequence is interpreted. In contrast: GAP did document its behaviour since 30 years and as we have seen this definition is actually used in various places. So, why should we introduce a non backward compatible change for this (for which some code that is around for decades needs to be changed)? In some examples above it is even nice for aesthetic reasons that one can write strings in the same way as identifiers:
instead of the equivalent
because the corresponding identifier must be escaped anyway:
PS: I still agree that the advantage of being able to specify characters in hex notation justifies the slight backward incompatibility with respect to the rarely, if ever, used |
@frankluebeck how about (as a middle group) we should just forbid \ before a letter? Have any cases of this turned up which were not actually mistakes? |
One reason I see why we should maybe at least warn is the bugs that were found by making this change, i.e. instances of escaping that wasn't actually escaping. That said, I do agree that the current behaviour is documented and that I got a bit worried about cases such as I am not sure about how much more complicated we'd want to make the rules for escaping. |
I like the idea of reserving just ascii letters. The '\N' above is probably a typo, meant to be '\n'. Alternatively, we might use \0x instead of \x for hex strings, thus allowing a previously forbidden escape sequence. I also wouldn't mind \u or \0u meaning UTF-8 encoding (whether this makes sense in the given context or not). This is no better or worse than octal and hex characters. Would we want four or five (hex) digit unicode codes? |
I thought about proposing to disallow only non-documented backslash escapes with letters for later extensions. But then decided against it because I cannot see a big enough benefit to justify the change of a decades old documented behaviour. On the other hand I agree that this "middle version" is likely to reveal more bugs than cause annoyance. With respect to unicode I agree with a comment by @stevelinton that we should first discuss a proper concept of strings, text objects, encodings, and so on, before extending the parser. |
This pull request does essentially three things (it also contains #612, because I branched it off of that PR):
GetStr
andGetChar
\
in a string that is not allowed as per documented escape sequences"\!"
would just have been parsed as"!"
anyway."\+", "\-", "\*", "\/"
. They would have been parsed just as"+", "-", "*", "/"
anyway, but there was a convention of backslashing them for some reason.WARNING: This breaks packages, in particular GAPDoc