Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: ES6-style unicode string escaping. #446

Merged
merged 2 commits into from
Dec 11, 2014

Conversation

SimonSapin
Copy link
Contributor

Rendered.

Remove \u203D and \U0001F4A9 unicode string escapes, and add ECMAScript 6-style \u{1F4A9} escapes instead.

Text: /~https://github.com/rust-lang/rfcs/blob/master/text/0446-es6-unicode-escapes.md
RFC PR: #446

@edef1c
Copy link

edef1c commented Nov 6, 2014

👍
This also increases readability — picking out a \uabcd in a long string of text is a lot harder than picking out \u{abcd}.

@emberian
Copy link
Member

emberian commented Nov 6, 2014

👍 as well.

@blaenk
Copy link
Contributor

blaenk commented Nov 6, 2014

Yeah this is nice.

@reem
Copy link

reem commented Nov 6, 2014

+1 from me too. Definitely an improvement.

@pnkfelix
Copy link
Member

pnkfelix commented Nov 7, 2014

One drawback not mentioned in RFC: format! templates that need Unicode escapes are forced to become uglier. (This need not be the case if we adopt the alternative of keeping both the old and new syntaxes)

@pnkfelix
Copy link
Member

pnkfelix commented Nov 7, 2014

(Well, what I said isn't actually quite right, since the Unicode escape is probably handled at an earlier lexical level and thus would not be seen by the fmt code)

@SimonSapin
Copy link
Contributor Author

@pnkfelix I don’t understand how format! templates are different from any string. Can you expand / give an example?

@alexcrichton
Copy link
Member

I think @pnkfelix may be referring to something such as:

format!("{foo} \u{AF} {bar}", foo = "", bar = "")

In this case the braces around AF are lexed in the Rust parser as a unicode escape, while the braces around foo and bar are just normal parts of the string which are then interpreted as placeholders by the formatting string syntax.

@SimonSapin
Copy link
Contributor Author

Oh, I see. That’s a good point, I’ve added it to Drawbacks.

@Gankra
Copy link
Contributor

Gankra commented Nov 7, 2014

Is it possible to just use parens or some other delimiter?

@ftxqxd
Copy link
Contributor

ftxqxd commented Nov 7, 2014

👍; this makes Unicode escapes much clearer and more obvious. It should be noted that using {} in escapes is not unique to ES6. Python, Perl, and probably many other languages have \N{foo bar} escapes, where foo bar is the name of any Unicode character. Perhaps Rust should have these, too: \N{no-break space} is much clearer and more self-explanatory than \u{a0}.

@mdinger
Copy link
Contributor

mdinger commented Nov 8, 2014

As best I can tell, the aliases used with Python Unicode escape sequences are documented here

@huonw
Copy link
Member

huonw commented Nov 8, 2014

@P1start (FWIW, I actually have a macro lib that offers that.)

@SimonSapin
Copy link
Contributor Author

@gankro, it would be possible, but we would lose the benefit of the ES6 precedent.

@P1start, interesting. "\N{foo}" currently causes a unknown character escape: N error, so adding named escapes later would be backward-compatible.

@pnkfelix
Copy link
Member

we decided in the meeting today to merge this. However, we also want to revise the RFC language to ensure up front that we will land this in a manner such that there would exist a window of time where the old syntax is supported; i.e.: First add the new syntax as a feature, convert our code to use it, let that settle, and then remove the old syntax.

(Part of the reason we want to follow the above procedure is that we do not want to block 1.0 on this feature; making sure we remove the old syntax last, potentially in a separate beta cycle, should help mitigate that risk.)

@SimonSapin would you be up for revising the text to accommodate the above goal, or would you like me to do it on your behalf?

@SimonSapin
Copy link
Contributor Author

@pnkfelix Please do the revising, as I’m not sure of the details of the language you have in mind.

Should usage of the old syntax give a warning, after the new one is added?

@pnkfelix
Copy link
Member

@SimonSapin i suspect warning would be best.

@SimonSapin
Copy link
Contributor Author

@pnkfelix The first part of this has landed: rust-lang/rust#19480 Should this be merged despite the "deployment plan" not being in the text?

@pnkfelix
Copy link
Member

@SimonSapin ah sorry, I will try to merge pronto.

@pnkfelix pnkfelix merged commit b310c87 into rust-lang:master Dec 11, 2014
@SimonSapin SimonSapin deleted the es6-unicode-escapes branch July 6, 2015 14:41
behnam added a commit to behnam/r12a.github.io that referenced this pull request Apr 25, 2017
In Rust, it's recommended to use short (non-zero-padded) code-points
inside ES6-style escaping sequences (`\u{...}`), as it reduces the
length of the literal, and works better on the eyes for average use
cases, while mechanical parsing remains still fairly easy.

See examples in the Rust RFC and related discussion:
* /~https://github.com/rust-lang/rfcs/blob/master/text/0446-es6-unicode-escapes.md
* rust-lang/rfcs#446
* rust-lang/rust#19739
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-syntax Syntax related proposals & ideas
Projects
None yet
Development

Successfully merging this pull request may close these issues.