Skip to content

Commit

Permalink
Delay string literal unescaping.
Browse files Browse the repository at this point in the history
Currently string literals are unescaped twice.
- Once during lexing in `cook_quoted`/`cook_c_string`/`cook_common`.
  This one just checks for errors.
- Again in `LitKind::from_token_lit`, which is mostly called when
  lowering AST to HIR, but also in a few other places during expansion.
  This one actually constructs the unescaped string. It also has error
  checking code, but that code handling the error cases is actually dead
  (and has several bugs) because the check during lexing catches all
  errors!

This commit removes the checking during lexing, and fixes up
`LitKind::from_token_lit` so it properly does both checking and
construction. This is a language change: some programs now compile that
previously did not. For example, it is now possible for macros to be
passed "invalid" string literals like "\a\b\c". This is a continuation
of a trend of delaying semantic error checking of literals to after
expansion, e.g. #102944 did this for some cases for numeric literals,
and the detection of NUL chars in C string literals is already delayed
in this way.

XXX: have Session::report_lit_errors?
XXX: have LitKind::from_token_lit so you don't need the .0?

Things to note:
- `LitError` has a new `EscapeError` variant.
- `LitKind::from_token_lit`'s return value changed, to produce multiple
  errors/warnings, and also to handle lexer warnings. This latter case
  is annoying but necessary to preserve existing warning behaviour.
- `report_lit_error` becomes `report_lit_errors`, in order to handle
  multiple errors in a single string literal.

Notes about test changes:

- `tests/rustdoc-ui/ignore-block-help.rs`: this relies on a parsing
  error occurring. The error present was an unescaping error, which is
  now delayed to after parsing. So the commit changes it to an
  "unterminated character literal" error which continues to occurs
  during parsing.

- Several tests had unescaping errors combined with unterminated literal
  errors. The former are now delayed but the latter remain as lexing
  errors. So the unterminated literal part needed to be split into a
  separate test file otherwise compilation would end before the other
  errors were reported.

- issue-62913.rs: The structure and output changed a bit. Issue #62913
  was about an ICE due to an unterminated string literal, so the new
  version should be good enough.

- literals-are-validated-before-expansion.rs: this tests exactly the
  behaviour that has been changed, and so was removed

  XXX: insert a new test covering more of that

- A couple of other test produce the same errors, just in a different
  order.
  • Loading branch information
nnethercote committed Dec 11, 2023
1 parent 6f6d73b commit 4314dff
Show file tree
Hide file tree
Showing 43 changed files with 488 additions and 438 deletions.
3 changes: 2 additions & 1 deletion compiler/rustc_ast/src/attr/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -240,7 +240,7 @@ impl AttrArgsEq {
match self {
AttrArgsEq::Ast(expr) => match expr.kind {
ExprKind::Lit(token_lit) => {
LitKind::from_token_lit(token_lit).ok().and_then(|lit| lit.str())
LitKind::from_token_lit(token_lit).0.ok().and_then(|lit| lit.str())
}
_ => None,
},
Expand Down Expand Up @@ -426,6 +426,7 @@ impl MetaItemKind {
ExprKind::Lit(token_lit) => {
// Turn failures to `None`, we'll get parse errors elsewhere.
MetaItemLit::from_token_lit(token_lit, expr.span)
.0
.ok()
.map(|lit| MetaItemKind::NameValue(lit))
}
Expand Down
266 changes: 186 additions & 80 deletions compiler/rustc_ast/src/util/literal.rs

Large diffs are not rendered by default.

11 changes: 5 additions & 6 deletions compiler/rustc_ast_lowering/src/expr.rs
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ use rustc_data_structures::stack::ensure_sufficient_stack;
use rustc_hir as hir;
use rustc_hir::def::{DefKind, Res};
use rustc_middle::span_bug;
use rustc_parse::parser::report_lit_error;
use rustc_parse::parser::report_lit_errors;
use rustc_span::source_map::{respan, Spanned};
use rustc_span::symbol::{kw, sym, Ident, Symbol};
use rustc_span::DUMMY_SP;
Expand Down Expand Up @@ -119,13 +119,12 @@ impl<'hir> LoweringContext<'_, 'hir> {
hir::ExprKind::Unary(op, ohs)
}
ExprKind::Lit(token_lit) => {
let lit_kind = match LitKind::from_token_lit(*token_lit) {
let (result, errs) = LitKind::from_token_lit(*token_lit);
let lit_kind = match result {
Ok(lit_kind) => lit_kind,
Err(err) => {
report_lit_error(&self.tcx.sess.parse_sess, err, *token_lit, e.span);
LitKind::Err
}
Err(()) => LitKind::Err,
};
report_lit_errors(&self.tcx.sess.parse_sess, errs, *token_lit, e.span);
let lit = self.arena.alloc(respan(self.lower_span(e.span), lit_kind));
hir::ExprKind::Lit(lit)
}
Expand Down
4 changes: 2 additions & 2 deletions compiler/rustc_ast_lowering/src/format.rs
Original file line number Diff line number Diff line change
Expand Up @@ -127,11 +127,11 @@ fn inline_literals(mut fmt: Cow<'_, FormatArgs>) -> Cow<'_, FormatArgs> {
&& let ExprKind::Lit(lit) = arg.kind
{
if let token::LitKind::Str | token::LitKind::StrRaw(_) = lit.kind
&& let Ok(LitKind::Str(s, _)) = LitKind::from_token_lit(lit)
&& let Ok(LitKind::Str(s, _)) = LitKind::from_token_lit(lit).0
{
literal = Some(s);
} else if let token::LitKind::Integer = lit.kind
&& let Ok(LitKind::Int(n, _)) = LitKind::from_token_lit(lit)
&& let Ok(LitKind::Int(n, _)) = LitKind::from_token_lit(lit).0
{
literal = Some(Symbol::intern(&n.to_string()));
}
Expand Down
2 changes: 1 addition & 1 deletion compiler/rustc_ast_lowering/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -948,7 +948,7 @@ impl<'a, 'hir> LoweringContext<'a, 'hir> {
// In valid code the value always ends up as a single literal. Otherwise, a dummy
// literal suffices because the error is handled elsewhere.
let lit = if let ExprKind::Lit(token_lit) = expr.kind
&& let Ok(lit) = MetaItemLit::from_token_lit(token_lit, expr.span)
&& let Ok(lit) = MetaItemLit::from_token_lit(token_lit, expr.span).0
{
lit
} else {
Expand Down
65 changes: 33 additions & 32 deletions compiler/rustc_builtin_macros/src/concat.rs
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
use rustc_ast as ast;
use rustc_ast::tokenstream::TokenStream;
use rustc_expand::base::{self, DummyResult};
use rustc_parse::parser::report_lit_error;
use rustc_parse::parser::report_lit_errors;
use rustc_span::symbol::Symbol;

use crate::errors;
Expand All @@ -19,48 +19,49 @@ pub fn expand_concat(
let mut has_errors = false;
for e in es {
match e.kind {
ast::ExprKind::Lit(token_lit) => match ast::LitKind::from_token_lit(token_lit) {
Ok(ast::LitKind::Str(s, _) | ast::LitKind::Float(s, _)) => {
accumulator.push_str(s.as_str());
}
Ok(ast::LitKind::Char(c)) => {
accumulator.push(c);
}
Ok(ast::LitKind::Int(i, _)) => {
accumulator.push_str(&i.to_string());
}
Ok(ast::LitKind::Bool(b)) => {
accumulator.push_str(&b.to_string());
}
Ok(ast::LitKind::CStr(..)) => {
cx.emit_err(errors::ConcatCStrLit { span: e.span });
has_errors = true;
}
Ok(ast::LitKind::Byte(..) | ast::LitKind::ByteStr(..)) => {
cx.emit_err(errors::ConcatBytestr { span: e.span });
has_errors = true;
}
Ok(ast::LitKind::Err) => {
has_errors = true;
}
Err(err) => {
report_lit_error(&cx.sess.parse_sess, err, token_lit, e.span);
has_errors = true;
ast::ExprKind::Lit(token_lit) => {
let (res, errs) = ast::LitKind::from_token_lit(token_lit);
match res {
Ok(ast::LitKind::Str(s, _) | ast::LitKind::Float(s, _)) => {
accumulator.push_str(s.as_str());
}
Ok(ast::LitKind::Char(c)) => {
accumulator.push(c);
}
Ok(ast::LitKind::Int(i, _)) => {
accumulator.push_str(&i.to_string());
}
Ok(ast::LitKind::Bool(b)) => {
accumulator.push_str(&b.to_string());
}
Ok(ast::LitKind::CStr(..)) => {
cx.emit_err(errors::ConcatCStrLit { span: e.span });
has_errors = true;
}
Ok(ast::LitKind::Byte(..) | ast::LitKind::ByteStr(..)) => {
cx.emit_err(errors::ConcatBytestr { span: e.span });
has_errors = true;
}
Ok(ast::LitKind::Err) | Err(()) => {
has_errors = true;
}
}
},
report_lit_errors(&cx.sess.parse_sess, errs, token_lit, e.span);
}
// We also want to allow negative numeric literals.
ast::ExprKind::Unary(ast::UnOp::Neg, ref expr)
if let ast::ExprKind::Lit(token_lit) = expr.kind =>
{
match ast::LitKind::from_token_lit(token_lit) {
let (res, errs) = ast::LitKind::from_token_lit(token_lit);
match res {
Ok(ast::LitKind::Int(i, _)) => accumulator.push_str(&format!("-{i}")),
Ok(ast::LitKind::Float(f, _)) => accumulator.push_str(&format!("-{f}")),
Err(err) => {
report_lit_error(&cx.sess.parse_sess, err, token_lit, e.span);
Err(()) => {
has_errors = true;
}
_ => missing_literal.push(e.span),
}
report_lit_errors(&cx.sess.parse_sess, errs, token_lit, e.span);
}
ast::ExprKind::IncludedBytes(..) => {
cx.emit_err(errors::ConcatBytestr { span: e.span });
Expand Down
16 changes: 8 additions & 8 deletions compiler/rustc_builtin_macros/src/concat_bytes.rs
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
use rustc_ast as ast;
use rustc_ast::{ptr::P, tokenstream::TokenStream};
use rustc_expand::base::{self, DummyResult};
use rustc_parse::parser::report_lit_error;
use rustc_parse::parser::report_lit_errors;
use rustc_span::Span;

use crate::errors;
Expand All @@ -17,7 +17,8 @@ fn invalid_type_err(
ConcatBytesInvalid, ConcatBytesInvalidSuggestion, ConcatBytesNonU8, ConcatBytesOob,
};
let snippet = cx.sess.source_map().span_to_snippet(span).ok();
match ast::LitKind::from_token_lit(token_lit) {
let (res, errs) = ast::LitKind::from_token_lit(token_lit);
match res {
Ok(ast::LitKind::CStr(_, _)) => {
// Avoid ambiguity in handling of terminal `NUL` by refusing to
// concatenate C string literals as bytes.
Expand Down Expand Up @@ -60,10 +61,9 @@ fn invalid_type_err(
cx.emit_err(ConcatBytesNonU8 { span });
}
Ok(ast::LitKind::ByteStr(..) | ast::LitKind::Byte(_)) => unreachable!(),
Err(err) => {
report_lit_error(&cx.sess.parse_sess, err, token_lit, span);
}
Err(()) => {}
}
report_lit_errors(&cx.sess.parse_sess, errs, token_lit, span);
}

fn handle_array_element(
Expand All @@ -80,7 +80,7 @@ fn handle_array_element(
*has_errors = true;
None
}
ast::ExprKind::Lit(token_lit) => match ast::LitKind::from_token_lit(token_lit) {
ast::ExprKind::Lit(token_lit) => match ast::LitKind::from_token_lit(token_lit).0 {
Ok(ast::LitKind::Int(
val,
ast::LitIntType::Unsuffixed | ast::LitIntType::Unsigned(ast::UintTy::U8),
Expand Down Expand Up @@ -141,7 +141,7 @@ pub fn expand_concat_bytes(
ast::ExprKind::Repeat(expr, count) => {
if let ast::ExprKind::Lit(token_lit) = count.value.kind
&& let Ok(ast::LitKind::Int(count_val, _)) =
ast::LitKind::from_token_lit(token_lit)
ast::LitKind::from_token_lit(token_lit).0
{
if let Some(elem) =
handle_array_element(cx, &mut has_errors, &mut missing_literals, expr)
Expand All @@ -154,7 +154,7 @@ pub fn expand_concat_bytes(
cx.emit_err(errors::ConcatBytesBadRepeat { span: count.value.span });
}
}
&ast::ExprKind::Lit(token_lit) => match ast::LitKind::from_token_lit(token_lit) {
&ast::ExprKind::Lit(token_lit) => match ast::LitKind::from_token_lit(token_lit).0 {
Ok(ast::LitKind::Byte(val)) => {
accumulator.push(val);
}
Expand Down
42 changes: 22 additions & 20 deletions compiler/rustc_expand/src/base.rs
Original file line number Diff line number Diff line change
Expand Up @@ -1235,26 +1235,28 @@ pub fn expr_to_spanned_string<'a>(
let expr = cx.expander().fully_expand_fragment(AstFragment::Expr(expr)).make_expr();

Err(match expr.kind {
ast::ExprKind::Lit(token_lit) => match ast::LitKind::from_token_lit(token_lit) {
Ok(ast::LitKind::Str(s, style)) => return Ok((s, style, expr.span)),
Ok(ast::LitKind::ByteStr(..)) => {
let mut err = cx.struct_span_err(expr.span, err_msg);
let span = expr.span.shrink_to_lo();
err.span_suggestion(
span.with_hi(span.lo() + BytePos(1)),
"consider removing the leading `b`",
"",
Applicability::MaybeIncorrect,
);
Some((err, true))
}
Ok(ast::LitKind::Err) => None,
Err(err) => {
parser::report_lit_error(&cx.sess.parse_sess, err, token_lit, expr.span);
None
}
_ => Some((cx.struct_span_err(expr.span, err_msg), false)),
},
ast::ExprKind::Lit(token_lit) => {
let (lit_kind, errs) = ast::LitKind::from_token_lit(token_lit);
let res = match lit_kind {
Ok(ast::LitKind::Str(s, style)) => return Ok((s, style, expr.span)),
Ok(ast::LitKind::ByteStr(..)) => {
let mut err = cx.struct_span_err(expr.span, err_msg);
let span = expr.span.shrink_to_lo();
err.span_suggestion(
span.with_hi(span.lo() + BytePos(1)),
"consider removing the leading `b`",
"",
Applicability::MaybeIncorrect,
);
Some((err, true))
}
Ok(ast::LitKind::Err) => None,
Err(()) => None,
_ => Some((cx.struct_span_err(expr.span, err_msg), false)),
};
parser::report_lit_errors(&cx.sess.parse_sess, errs, token_lit, expr.span);
res
}
ast::ExprKind::Err => None,
_ => Some((cx.struct_span_err(expr.span, err_msg), false)),
})
Expand Down
2 changes: 1 addition & 1 deletion compiler/rustc_expand/src/mbe/metavar_expr.rs
Original file line number Diff line number Diff line change
Expand Up @@ -119,7 +119,7 @@ fn parse_depth<'sess>(
.span_diagnostic
.struct_span_err(span, "meta-variable expression depth must be a literal"));
};
if let Ok(lit_kind) = LitKind::from_token_lit(*lit)
if let Ok(lit_kind) = LitKind::from_token_lit(*lit).0
&& let LitKind::Int(n_u128, LitIntType::Unsuffixed) = lit_kind
&& let Ok(n_usize) = usize::try_from(n_u128)
{
Expand Down
2 changes: 1 addition & 1 deletion compiler/rustc_lexer/src/unescape.rs
Original file line number Diff line number Diff line change
Expand Up @@ -347,7 +347,7 @@ where
// them in the range computation.
while let Some(c) = chars.next() {
let start = src.len() - chars.as_str().len() - c.len_utf8();
let res = match c {
let res: Result<T, EscapeError> = match c {
'\\' => {
match chars.clone().next() {
Some('\n') => {
Expand Down
Loading

0 comments on commit 4314dff

Please sign in to comment.