Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking issue for RFC 446 - ES6-style unicode escapes #19739

Closed
pnkfelix opened this issue Dec 11, 2014 · 10 comments
Closed

Tracking issue for RFC 446 - ES6-style unicode escapes #19739

pnkfelix opened this issue Dec 11, 2014 · 10 comments
Labels
A-parser Area: The parsing of Rust source code to an AST

Comments

@pnkfelix
Copy link
Member

Remove \u203D and \U0001F4A9 unicode string escapes, and add ECMAScript 6-style \u{1F4A9} escapes instead.

Text: /~https://github.com/rust-lang/rfcs/blob/master/text/0446-es6-unicode-escapes.md
RFC PR: rust-lang/rfcs#446

Migration strategy: /~https://github.com/rust-lang/rfcs/blob/master/text/0446-es6-unicode-escapes.md#migration-strategy

@pnkfelix
Copy link
Member Author

Note that stage 1 and the foundation for stage 2 in the migration strategy have been added by PR #19480

@emk
Copy link
Contributor

emk commented Dec 13, 2014

I know this is probably on your to-do list already, but I found an interesting corner-case in syntax::print, and traced it back to std::char::Char::escape_unicode: #19811

@aochagavia
Copy link
Contributor

Though not mentioned in the RFC, we still have \x.. to escape hexadecimal characters. Is this explicitly intended or should it also be removed?

@steveklabnik
Copy link
Member

Hexidecimal doesn't suffer the same issues, because the full range is expressible, right?

@aochagavia
Copy link
Contributor

It seems weird to me that you can do this: assert_eq!('\u{7f}', '\x7f'). But maybe it is just me.

I thought that \x should be deprecated as it is just a limited version of \u{...}.

@aochagavia
Copy link
Contributor

cc @SimonSapin what do you think?

@SimonSapin
Copy link
Contributor

I suggest leaving \xHH unchanged.

The reasoning behind the {} delimiters in \u{1F4A9} is to allow a variable number of digits, without initial zeros. Unicode goes up to U+10FFFF with 6 digits, but the majority of assigned code points still only needs 4 significant digits or less.

\xHH is fixed to two digits, but the \x00 to \x0F range is all control characters, most of which are very rarely used. The less rare ones, \x00 (nul), \x09 (tab), \x0A (newline), and \x0D (carriage return) all have shorter syntax already: \0, \t, \n, and \r, respectively. So there is no point in having \x{HH} and \x{H}.

@SimonSapin
Copy link
Contributor

Oh, looks like I completely misread you, sorry. Yes, removing \xHH entirely when the replacement was \u00HH was annoying, but now it might not be as bad. I don’t mind either way, but it’d probably have to be a separate RFC. See rust-lang/rfcs#326 for prior discussion, including a decision not to remove it. (But that was before \u{H+})

@kmcallister kmcallister added the A-parser Area: The parsing of Rust source code to an AST label Jan 17, 2015
@genbattle
Copy link
Contributor

I have opened a PR here to update the documentation in relation to this change.

@alexcrichton
Copy link
Member

This has been completed.

behnam added a commit to behnam/r12a.github.io that referenced this issue Apr 25, 2017
In Rust, it's recommended to use short (non-zero-padded) code-points
inside ES6-style escaping sequences (`\u{...}`), as it reduces the
length of the literal, and works better on the eyes for average use
cases, while mechanical parsing remains still fairly easy.

See examples in the Rust RFC and related discussion:
* /~https://github.com/rust-lang/rfcs/blob/master/text/0446-es6-unicode-escapes.md
* rust-lang/rfcs#446
* rust-lang/rust#19739
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-parser Area: The parsing of Rust source code to an AST
Projects
None yet
Development

No branches or pull requests

8 participants