Delay string literal unescaping.

Currently string literals are unescaped twice. - Once during lexing in `cook_quoted`/`cook_c_string`/`cook_common`. This one just checks for errors. - Again in `LitKind::from_token_lit`, which is mostly called when lowering AST to HIR, but also in a few other places during expansion. This one actually constructs the unescaped string. It also has error checking code, but that code handling the error cases is actually dead (and has several bugs) because the check during lexing catches all errors! This commit removes the checking during lexing, and fixes up `LitKind::from_token_lit` so it properly does both checking and construction. This is a language change: some programs now compile that previously did not. For example, it is now possible for macros to be passed "invalid" string literals like "\a\b\c". This is a continuation of a trend of delaying semantic error checking of literals to after expansion, e.g. #102944 did this for some cases for numeric literals, and the detection of NUL chars in C string literals is already delayed in this way. XXX: have Session::report_lit_errors? XXX: have LitKind::from_token_lit so you don't need the .0? Things to note: - `LitError` has a new `EscapeError` variant. - `LitKind::from_token_lit`'s return value changed, to produce multiple errors/warnings, and also to handle lexer warnings. This latter case is annoying but necessary to preserve existing warning behaviour. - `report_lit_error` becomes `report_lit_errors`, in order to handle multiple errors in a single string literal. Notes about test changes: - `tests/rustdoc-ui/ignore-block-help.rs`: this relies on a parsing error occurring. The error present was an unescaping error, which is now delayed to after parsing. So the commit changes it to an "unterminated character literal" error which continues to occurs during parsing. - Several tests had unescaping errors combined with unterminated literal errors. The former are now delayed but the latter remain as lexing errors. So the unterminated literal part needed to be split into a separate test file otherwise compilation would end before the other errors were reported. - issue-62913.rs: The structure and output changed a bit. Issue #62913 was about an ICE due to an unterminated string literal, so the new version should be good enough. - literals-are-validated-before-expansion.rs: this tests exactly the behaviour that has been changed, and so was removed XXX: insert a new test covering more of that - A couple of other test produce the same errors, just in a different order.
rust-lang · Dec 11, 2023 · 4314dff · 4314dff
1 parent 6f6d73b
commit 4314dff
Show file tree

Hide file tree

Showing 43 changed files with 488 additions and 438 deletions.
diff --git a/compiler/rustc_ast/src/attr/mod.rs b/compiler/rustc_ast/src/attr/mod.rs
@@ -240,7 +240,7 @@ impl AttrArgsEq {
         match self {
             AttrArgsEq::Ast(expr) => match expr.kind {
                 ExprKind::Lit(token_lit) => {
-                    LitKind::from_token_lit(token_lit).ok().and_then(|lit| lit.str())
+                    LitKind::from_token_lit(token_lit).0.ok().and_then(|lit| lit.str())
                 }
                 _ => None,
             },
@@ -426,6 +426,7 @@ impl MetaItemKind {
                 ExprKind::Lit(token_lit) => {
                     // Turn failures to `None`, we'll get parse errors elsewhere.
                     MetaItemLit::from_token_lit(token_lit, expr.span)
+                        .0
                         .ok()
                         .map(|lit| MetaItemKind::NameValue(lit))
                 }

diff --git a/compiler/rustc_ast/src/util/literal.rs b/compiler/rustc_ast/src/util/literal.rs
diff --git a/compiler/rustc_ast_lowering/src/expr.rs b/compiler/rustc_ast_lowering/src/expr.rs
@@ -14,7 +14,7 @@ use rustc_data_structures::stack::ensure_sufficient_stack;
 use rustc_hir as hir;
 use rustc_hir::def::{DefKind, Res};
 use rustc_middle::span_bug;
-use rustc_parse::parser::report_lit_error;
+use rustc_parse::parser::report_lit_errors;
 use rustc_span::source_map::{respan, Spanned};
 use rustc_span::symbol::{kw, sym, Ident, Symbol};
 use rustc_span::DUMMY_SP;
@@ -119,13 +119,12 @@ impl<'hir> LoweringContext<'_, 'hir> {
                     hir::ExprKind::Unary(op, ohs)
                 }
                 ExprKind::Lit(token_lit) => {
-                    let lit_kind = match LitKind::from_token_lit(*token_lit) {
+                    let (result, errs) = LitKind::from_token_lit(*token_lit);
+                    let lit_kind = match result {
                         Ok(lit_kind) => lit_kind,
-                        Err(err) => {
-                            report_lit_error(&self.tcx.sess.parse_sess, err, *token_lit, e.span);
-                            LitKind::Err
-                        }
+                        Err(()) => LitKind::Err,
                     };
+                    report_lit_errors(&self.tcx.sess.parse_sess, errs, *token_lit, e.span);
                     let lit = self.arena.alloc(respan(self.lower_span(e.span), lit_kind));
                     hir::ExprKind::Lit(lit)
                 }

diff --git a/compiler/rustc_ast_lowering/src/format.rs b/compiler/rustc_ast_lowering/src/format.rs
@@ -127,11 +127,11 @@ fn inline_literals(mut fmt: Cow<'_, FormatArgs>) -> Cow<'_, FormatArgs> {
             && let ExprKind::Lit(lit) = arg.kind
         {
             if let token::LitKind::Str | token::LitKind::StrRaw(_) = lit.kind
-                && let Ok(LitKind::Str(s, _)) = LitKind::from_token_lit(lit)
+                && let Ok(LitKind::Str(s, _)) = LitKind::from_token_lit(lit).0
             {
                 literal = Some(s);
             } else if let token::LitKind::Integer = lit.kind
-                && let Ok(LitKind::Int(n, _)) = LitKind::from_token_lit(lit)
+                && let Ok(LitKind::Int(n, _)) = LitKind::from_token_lit(lit).0
             {
                 literal = Some(Symbol::intern(&n.to_string()));
             }

diff --git a/compiler/rustc_ast_lowering/src/lib.rs b/compiler/rustc_ast_lowering/src/lib.rs
@@ -948,7 +948,7 @@ impl<'a, 'hir> LoweringContext<'a, 'hir> {
                 // In valid code the value always ends up as a single literal. Otherwise, a dummy
                 // literal suffices because the error is handled elsewhere.
                 let lit = if let ExprKind::Lit(token_lit) = expr.kind
-                    && let Ok(lit) = MetaItemLit::from_token_lit(token_lit, expr.span)
+                    && let Ok(lit) = MetaItemLit::from_token_lit(token_lit, expr.span).0
                 {
                     lit
                 } else {

diff --git a/compiler/rustc_builtin_macros/src/concat.rs b/compiler/rustc_builtin_macros/src/concat.rs
@@ -1,7 +1,7 @@
 use rustc_ast as ast;
 use rustc_ast::tokenstream::TokenStream;
 use rustc_expand::base::{self, DummyResult};
-use rustc_parse::parser::report_lit_error;
+use rustc_parse::parser::report_lit_errors;
 use rustc_span::symbol::Symbol;
 
 use crate::errors;
@@ -19,48 +19,49 @@ pub fn expand_concat(
     let mut has_errors = false;
     for e in es {
         match e.kind {
-            ast::ExprKind::Lit(token_lit) => match ast::LitKind::from_token_lit(token_lit) {
-                Ok(ast::LitKind::Str(s, _) | ast::LitKind::Float(s, _)) => {
-                    accumulator.push_str(s.as_str());
-                }
-                Ok(ast::LitKind::Char(c)) => {
-                    accumulator.push(c);
-                }
-                Ok(ast::LitKind::Int(i, _)) => {
-                    accumulator.push_str(&i.to_string());
-                }
-                Ok(ast::LitKind::Bool(b)) => {
-                    accumulator.push_str(&b.to_string());
-                }
-                Ok(ast::LitKind::CStr(..)) => {
-                    cx.emit_err(errors::ConcatCStrLit { span: e.span });
-                    has_errors = true;
-                }
-                Ok(ast::LitKind::Byte(..) | ast::LitKind::ByteStr(..)) => {
-                    cx.emit_err(errors::ConcatBytestr { span: e.span });
-                    has_errors = true;
-                }
-                Ok(ast::LitKind::Err) => {
-                    has_errors = true;
-                }
-                Err(err) => {
-                    report_lit_error(&cx.sess.parse_sess, err, token_lit, e.span);
-                    has_errors = true;
+            ast::ExprKind::Lit(token_lit) => {
+                let (res, errs) = ast::LitKind::from_token_lit(token_lit);
+                match res {
+                    Ok(ast::LitKind::Str(s, _) | ast::LitKind::Float(s, _)) => {
+                        accumulator.push_str(s.as_str());
+                    }
+                    Ok(ast::LitKind::Char(c)) => {
+                        accumulator.push(c);
+                    }
+                    Ok(ast::LitKind::Int(i, _)) => {
+                        accumulator.push_str(&i.to_string());
+                    }
+                    Ok(ast::LitKind::Bool(b)) => {
+                        accumulator.push_str(&b.to_string());
+                    }
+                    Ok(ast::LitKind::CStr(..)) => {
+                        cx.emit_err(errors::ConcatCStrLit { span: e.span });
+                        has_errors = true;
+                    }
+                    Ok(ast::LitKind::Byte(..) | ast::LitKind::ByteStr(..)) => {
+                        cx.emit_err(errors::ConcatBytestr { span: e.span });
+                        has_errors = true;
+                    }
+                    Ok(ast::LitKind::Err) | Err(()) => {
+                        has_errors = true;
+                    }
                 }
-            },
+                report_lit_errors(&cx.sess.parse_sess, errs, token_lit, e.span);
+            }
             // We also want to allow negative numeric literals.
             ast::ExprKind::Unary(ast::UnOp::Neg, ref expr)
                 if let ast::ExprKind::Lit(token_lit) = expr.kind =>
             {
-                match ast::LitKind::from_token_lit(token_lit) {
+                let (res, errs) = ast::LitKind::from_token_lit(token_lit);
+                match res {
                     Ok(ast::LitKind::Int(i, _)) => accumulator.push_str(&format!("-{i}")),
                     Ok(ast::LitKind::Float(f, _)) => accumulator.push_str(&format!("-{f}")),
-                    Err(err) => {
-                        report_lit_error(&cx.sess.parse_sess, err, token_lit, e.span);
+                    Err(()) => {
                         has_errors = true;
                     }
                     _ => missing_literal.push(e.span),
                 }
+                report_lit_errors(&cx.sess.parse_sess, errs, token_lit, e.span);
             }
             ast::ExprKind::IncludedBytes(..) => {
                 cx.emit_err(errors::ConcatBytestr { span: e.span });

diff --git a/compiler/rustc_builtin_macros/src/concat_bytes.rs b/compiler/rustc_builtin_macros/src/concat_bytes.rs
@@ -1,7 +1,7 @@
 use rustc_ast as ast;
 use rustc_ast::{ptr::P, tokenstream::TokenStream};
 use rustc_expand::base::{self, DummyResult};
-use rustc_parse::parser::report_lit_error;
+use rustc_parse::parser::report_lit_errors;
 use rustc_span::Span;
 
 use crate::errors;
@@ -17,7 +17,8 @@ fn invalid_type_err(
         ConcatBytesInvalid, ConcatBytesInvalidSuggestion, ConcatBytesNonU8, ConcatBytesOob,
     };
     let snippet = cx.sess.source_map().span_to_snippet(span).ok();
-    match ast::LitKind::from_token_lit(token_lit) {
+    let (res, errs) = ast::LitKind::from_token_lit(token_lit);
+    match res {
         Ok(ast::LitKind::CStr(_, _)) => {
             // Avoid ambiguity in handling of terminal `NUL` by refusing to
             // concatenate C string literals as bytes.
@@ -60,10 +61,9 @@ fn invalid_type_err(
             cx.emit_err(ConcatBytesNonU8 { span });
         }
         Ok(ast::LitKind::ByteStr(..) | ast::LitKind::Byte(_)) => unreachable!(),
-        Err(err) => {
-            report_lit_error(&cx.sess.parse_sess, err, token_lit, span);
-        }
+        Err(()) => {}
     }
+    report_lit_errors(&cx.sess.parse_sess, errs, token_lit, span);
 }
 
 fn handle_array_element(
@@ -80,7 +80,7 @@ fn handle_array_element(
             *has_errors = true;
             None
         }
-        ast::ExprKind::Lit(token_lit) => match ast::LitKind::from_token_lit(token_lit) {
+        ast::ExprKind::Lit(token_lit) => match ast::LitKind::from_token_lit(token_lit).0 {
             Ok(ast::LitKind::Int(
                 val,
                 ast::LitIntType::Unsuffixed | ast::LitIntType::Unsigned(ast::UintTy::U8),
@@ -141,7 +141,7 @@ pub fn expand_concat_bytes(
             ast::ExprKind::Repeat(expr, count) => {
                 if let ast::ExprKind::Lit(token_lit) = count.value.kind
                     && let Ok(ast::LitKind::Int(count_val, _)) =
-                        ast::LitKind::from_token_lit(token_lit)
+                        ast::LitKind::from_token_lit(token_lit).0
                 {
                     if let Some(elem) =
                         handle_array_element(cx, &mut has_errors, &mut missing_literals, expr)
@@ -154,7 +154,7 @@ pub fn expand_concat_bytes(
                     cx.emit_err(errors::ConcatBytesBadRepeat { span: count.value.span });
                 }
             }
-            &ast::ExprKind::Lit(token_lit) => match ast::LitKind::from_token_lit(token_lit) {
+            &ast::ExprKind::Lit(token_lit) => match ast::LitKind::from_token_lit(token_lit).0 {
                 Ok(ast::LitKind::Byte(val)) => {
                     accumulator.push(val);
                 }

diff --git a/compiler/rustc_expand/src/base.rs b/compiler/rustc_expand/src/base.rs
@@ -1235,26 +1235,28 @@ pub fn expr_to_spanned_string<'a>(
     let expr = cx.expander().fully_expand_fragment(AstFragment::Expr(expr)).make_expr();
 
     Err(match expr.kind {
-        ast::ExprKind::Lit(token_lit) => match ast::LitKind::from_token_lit(token_lit) {
-            Ok(ast::LitKind::Str(s, style)) => return Ok((s, style, expr.span)),
-            Ok(ast::LitKind::ByteStr(..)) => {
-                let mut err = cx.struct_span_err(expr.span, err_msg);
-                let span = expr.span.shrink_to_lo();
-                err.span_suggestion(
-                    span.with_hi(span.lo() + BytePos(1)),
-                    "consider removing the leading `b`",
-                    "",
-                    Applicability::MaybeIncorrect,
-                );
-                Some((err, true))
-            }
-            Ok(ast::LitKind::Err) => None,
-            Err(err) => {
-                parser::report_lit_error(&cx.sess.parse_sess, err, token_lit, expr.span);
-                None
-            }
-            _ => Some((cx.struct_span_err(expr.span, err_msg), false)),
-        },
+        ast::ExprKind::Lit(token_lit) => {
+            let (lit_kind, errs) = ast::LitKind::from_token_lit(token_lit);
+            let res = match lit_kind {
+                Ok(ast::LitKind::Str(s, style)) => return Ok((s, style, expr.span)),
+                Ok(ast::LitKind::ByteStr(..)) => {
+                    let mut err = cx.struct_span_err(expr.span, err_msg);
+                    let span = expr.span.shrink_to_lo();
+                    err.span_suggestion(
+                        span.with_hi(span.lo() + BytePos(1)),
+                        "consider removing the leading `b`",
+                        "",
+                        Applicability::MaybeIncorrect,
+                    );
+                    Some((err, true))
+                }
+                Ok(ast::LitKind::Err) => None,
+                Err(()) => None,
+                _ => Some((cx.struct_span_err(expr.span, err_msg), false)),
+            };
+            parser::report_lit_errors(&cx.sess.parse_sess, errs, token_lit, expr.span);
+            res
+        }
         ast::ExprKind::Err => None,
         _ => Some((cx.struct_span_err(expr.span, err_msg), false)),
     })

diff --git a/compiler/rustc_expand/src/mbe/metavar_expr.rs b/compiler/rustc_expand/src/mbe/metavar_expr.rs
@@ -119,7 +119,7 @@ fn parse_depth<'sess>(
             .span_diagnostic
             .struct_span_err(span, "meta-variable expression depth must be a literal"));
     };
-    if let Ok(lit_kind) = LitKind::from_token_lit(*lit)
+    if let Ok(lit_kind) = LitKind::from_token_lit(*lit).0
         && let LitKind::Int(n_u128, LitIntType::Unsuffixed) = lit_kind
         && let Ok(n_usize) = usize::try_from(n_u128)
     {

diff --git a/compiler/rustc_lexer/src/unescape.rs b/compiler/rustc_lexer/src/unescape.rs
@@ -347,7 +347,7 @@ where
     // them in the range computation.
     while let Some(c) = chars.next() {
         let start = src.len() - chars.as_str().len() - c.len_utf8();
-        let res = match c {
+        let res: Result<T, EscapeError> = match c {
             '\\' => {
                 match chars.clone().next() {
                     Some('\n') => {