-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consider unterminated f-strings in FStringRanges
#8154
Conversation
Current dependencies on/for this PR:
This comment was auto-generated by Graphite. |
7880903
to
c755335
Compare
if !self.start_locations.is_empty() { | ||
debug!( | ||
"Unterminated f-strings detected at: {:?}", | ||
self.start_locations | ||
); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this isn't really required because we'd be providing diagnostics for the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonnder if it would be useful to store the range with a missing and position instead of omitting it entirely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To extend on this: It might even be more correct to simply assume that everything after the fstring start token is inside the fstring if the end token is missing. However, what the correct interpretation is might depend on the specific usage. E.g. adding noqa comments might be a bad idea for unclosed fstring. Rules testing if the range is part of a multiline string could get away by assuming the string goes to the end of the file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought of the same in one of my proposed solution.
For rules, they're only being used to check W605
(invalid escape sequence) and ISC
(implicit string concatenation) which will be fine. For the former, the fix will be created (see below) while for later the fix won't be created.
-f"\.png{
+rf"\.png{
The NoQA logic will need to be looked into though. We might have to somehow mark these ranges as "incomplete from source code point of view".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm also fine with adding the entire range, but just omitting the range of an unterminated string seems wrong to me (I rather overestimate than underestimate the f-string range)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably avoid adding NoQA directives then otherwise it won't be lexed as a comment and the --add-noqa
will keep on adding them.
f"\.png
# noqa: W605
# noqa: W605
# ...
I'm not sure how to store this info. One solution could be to update the signature from BTreeMap<TextSize, TextRange>
to BTreeMap<TextSize, FStringRange>
where:
struct FStringRange {
range: TextRange,
// Is this f-string complete i.e., does it have both the start and end tokens?
complete: bool
}
Or, while computing the NoQA mapping we can ignore these ranges by somehow checking if they're for a complete f-string or not.
@charliermarsh Any thoughts here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we did change the type to TreeMap<TextSize, FStringRange>
, where would we then use that information to avoid adding NOQA?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Earlier I thought that ignoring such ranges here would work:
ruff/crates/ruff_linter/src/directives.rs
Lines 154 to 171 in 29fb86e
// For nested f-strings, we expect `noqa` directives on the last line of the | |
// outermost f-string. The last f-string range will be used to skip over | |
// the inner f-strings. | |
let mut last_fstring_range: TextRange = TextRange::default(); | |
for fstring_range in indexer.fstring_ranges().values() { | |
if !locator.contains_line_break(*fstring_range) { | |
continue; | |
} | |
if last_fstring_range.contains_range(*fstring_range) { | |
continue; | |
} | |
let new_range = TextRange::new( | |
locator.line_start(fstring_range.start()), | |
fstring_range.end(), | |
); | |
fstring_mappings.push(new_range); | |
last_fstring_range = new_range; | |
} |
But, I don't think so as then, by default, Ruff will add the directive at the end of the line.
I think we can update this function to return a Option<TextSize>
instead:
ruff/crates/ruff_linter/src/noqa.rs
Lines 746 to 762 in 29fb86e
pub(crate) fn resolve(&self, offset: TextSize) -> TextSize { | |
let index = self.ranges.binary_search_by(|range| { | |
if range.end() < offset { | |
std::cmp::Ordering::Less | |
} else if range.contains(offset) { | |
std::cmp::Ordering::Equal | |
} else { | |
std::cmp::Ordering::Greater | |
} | |
}); | |
if let Ok(index) = index { | |
self.ranges[index].end() | |
} else { | |
offset | |
} | |
} |
which will be used here and continue
on None
:
ruff/crates/ruff_linter/src/noqa.rs
Line 527 in 29fb86e
let noqa_offset = noqa_line_for.resolve(diagnostic.start()); |
Note that even if we somehow support this, it'll be impossible to ignore any violation on that line manually. For example,
# Unterminated f-string
f"\.png{x} # noqa: W605
Here, the noqa
comment is part of the f-string and is not a comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The plan you described here makes sense to me. Thanks for walking me through it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've a local branch with a possible implementation for this but there are a few complexities involved. I'll proceed with merging this and open a new PR with that change instead.
PR Check ResultsEcosystem✅ ecosystem check detected no changes. |
Shouldn't unterminated f-strings be caught in the parser? |
Yes, they're being caught. The issue which was highlighted in the linked issue is a debug assertion present in constructing the f-string ranges in the indexer. And, the indexer uses the flattened tokens for that purpose which ignores the error tokens. For the following code snippet, f"{123} We would've panicked but only in the debug build which is what this PR fixes. Actually, the release build could panic as well with: f"\.png${ And, running with: $ pipx run "ruff==0.1.1" check --isolated --no-cache --select=W605,E999 ~/playground/ruff/fstring.py --fix
error: Panicked while linting /Users/dhruv/playground/ruff/fstring.py: This indicates a bug in Ruff. If you could open an issue at:
/~https://github.com/astral-sh/ruff/issues/new?title=%5BLinter%20panic%5D
...with the relevant file contents, the `pyproject.toml` settings, and the following stack trace, we'd be very appreciative!
panicked at 'called `Option::unwrap()` on a `None` value', crates/ruff_linter/src/rules/pycodestyle/rules/invalid_escape_sequence.rs:175:18
Backtrace: 0: _rust_eh_personality
1: _main
2: _rust_eh_personality
3: _rust_eh_personality
4: _rust_eh_personality
5: _rust_eh_personality
6: __rjem_je_witnesses_cleanup
7: __rjem_je_witnesses_cleanup
8: _main
9: _main
10: _main
11: _main
12: _main
13: _main
14: _main
15: _main
16: start
17: start
18: _main |
(Tok::String { .. }, Tok::FStringStart) => ( | ||
*a_range, | ||
indexer.fstring_ranges().innermost(b_range.start()).unwrap(), | ||
), | ||
(Tok::FStringEnd, Tok::String { .. }) => ( | ||
indexer.fstring_ranges().innermost(a_range.start()).unwrap(), | ||
*b_range, | ||
), | ||
(Tok::FStringEnd, Tok::FStringStart) => ( | ||
indexer.fstring_ranges().innermost(a_range.start()).unwrap(), | ||
indexer.fstring_ranges().innermost(b_range.start()).unwrap(), | ||
), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These unwrap's aren't safe at all as I thought earlier. Ruff can panic on certain edge cases. It's better to account for the None
case.
Sorry, i missed that the index is built during parsing, not after |
I believe the Indexer is built before parsing, but not as part of parsing itself. We need to build it regardless of whether the source code is valid, because we do support linting invalid programs (at least for the token-based rules). |
1e980fd
to
29fb86e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer if we express the fact that unterminated f-strings are not stored in the return type of innermost
rather than just a note but are fine moving forward with this solution.
a7c0258
to
5398c07
Compare
FStringRanges
Summary
This PR removes the
debug_assertion
in theIndexer
to allow unterminated f-strings. This is mainly a fix in the development build which now matches the release build.The fix is simple: remove the
debug_assertion
which means that the there could beFStringStart
and possiblyFStringMiddle
tokens without a corresponding f-string range in theIndexer
. This means that the code requesting for the f-string index need to account for theNone
case, making the code safer.This also updates the code which queries the
FStringRanges
to account for theNone
case. This will happen when theFStringStart
/FStringMiddle
tokens are present but theFStringEnd
token isn't which means that theIndexer
won't contain the range for that f-string.Test Plan
cargo test
Taking the following code as an example:
f"{123}
This only emits a
FStringStart
token, but noFStringMiddle
orFStringEnd
tokens.And,
This emits a
FStringStart
andFStringMiddle
token, but noFStringEnd
token.fixes: #8065