Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add initial formatter implementation #2883

Merged
merged 16 commits into from
Feb 15, 2023
Merged

Add initial formatter implementation #2883

merged 16 commits into from
Feb 15, 2023

Conversation

charliermarsh
Copy link
Member

@charliermarsh charliermarsh commented Feb 14, 2023

Summary

This PR contains the code for the autoformatter proof-of-concept.

Crate structure

The primary formatting hook is the fmt function in crates/ruff_python_formatter/src/lib.rs.

The current formatter approach is outlined in crates/ruff_python_formatter/src/lib.rs, and is structured as follows:

  • Tokenize the code using the RustPython lexer.
  • In crates/ruff_python_formatter/src/trivia.rs, extract a variety of trivia tokens from the token stream. These include comments, trailing commas, and empty lines.
  • Generate the AST via the RustPython parser.
  • In crates/ruff_python_formatter/src/cst.rs, convert the AST to a CST structure. As of now, the CST is nearly identical to the AST, except that every node gets a trivia vector. But we might want to modify it further.
  • In crates/ruff_python_formatter/src/attachment.rs, attach each trivia token to the corresponding CST node. The logic for this is mostly in decorate_trivia and is ported almost directly from Prettier (given each token, find its preceding, following, and enclosing nodes, then attach the token to the appropriate node in a second pass).
  • In crates/ruff_python_formatter/src/newlines.rs, normalize newlines to match Black’s preferences. This involves traversing the CST and inserting or removing TriviaToken values as we go.
  • Call format! on the CST, which delegates to type-specific formatter implementations (e.g., crates/ruff_python_formatter/src/format/stmt.rs for Stmt nodes, and similar for Expr nodes; the others are trivial). Those type-specific implementations delegate to kind-specific functions (e.g., format_func_def).

Testing and iteration

The formatter is being developed against the Black test suite, which was copied over in-full to crates/ruff_python_formatter/resources/test/fixtures/black.

The Black fixtures had to be modified to create [insta](/~https://github.com/mitsuhiko/insta)-compatible snapshots, which now exist in the repo.

My approach thus far has been to try and improve coverage by tackling fixtures one-by-one.

What works, and what doesn’t

  • Most nodes are supported at a basic level (though there are a few stragglers at time of writing, like StmtKind::Try).
  • Newlines are properly preserved in most cases.
  • Magic trailing commas are properly preserved in some (but not all) cases.
  • Trivial leading and trailing standalone comments mostly work (although maybe not at the end of a file).
  • Inline comments, and comments within expressions, often don’t work -- they work in a few cases, but it’s one-off right now. (We’re probably associating them with the “right” nodes more often than we are actually rendering them in the right place.)
  • We don’t properly normalize string quotes. (At present, we just repeat any constants verbatim.)
  • We’re mishandling a bunch of wrapping cases (if we treat Black as the reference implementation). Here are a few examples (demonstrating Black's stable behavior):
# In some cases, if the end expression is "self-closing" (functions,
# lists, dictionaries, sets, subscript accesses, and any length-two
# boolean operations that end in these elments), Black
# will wrap like this...
if some_expression and f(
    b,
    c,
    d,
):
    pass

# ...whereas we do this:
if (
    some_expression
    and f(
        b,
        c,
        d,
    )
):
    pass

# If function arguments can fit on a single line, then Black will
# format them like this, rather than exploding them vertically.
if f(
    a, b, c, d, e, f, g, ...
):
    pass
  • We don’t properly preserve parentheses in all cases. Black preserves parentheses in some but not all cases.

@@ -0,0 +1,31 @@
{
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Meant to remove most of these -- dropped one commit. Hold on...

@charliermarsh
Copy link
Member Author

\cc @MichaReiser

@charliermarsh charliermarsh force-pushed the charlie/fork branch 4 times, most recently from 1a62ae7 to 9f83f01 Compare February 14, 2023 04:31
@charliermarsh
Copy link
Member Author

Need to find a way to ignore the "unused" snapshot files, which are intentionally unused.

@not-my-profile
Copy link
Contributor

Why are they intentionally unused? I assume you could change the file extension.

@charliermarsh
Copy link
Member Author

Why are they intentionally unused? I assume you could change the file extension.

They're copied over from Black, so they represent the "desired" output (essentially, the spec). The formatter fails on a bunch of them right now, so only those that it knowingly passes are used, and we'll increase that set over time.

@charliermarsh charliermarsh force-pushed the charlie/formatter-iv branch 3 times, most recently from 6dc9fa3 to 00835f8 Compare February 15, 2023 03:51
Base automatically changed from charlie/formatter-iv to main February 15, 2023 03:54
charliermarsh and others added 8 commits February 14, 2023 22:54
This PR makes the formatter code more idiomatic by:

* Unifying sequential `write!` calls into a single `write`
* Replace `format_args![item]` with `item` (`format_args` is to format multiple arguments). You can think of `format_args` as a `Format` implementation for tuples of arbitrary size
* Return an object that implements `Format` for `join_names` so that it can be used as part of the DSL `write!(f, [space, join_names(names)])`
* Use `space` instead of `text(" ")`
…ize` (#2)

Remove the `rome_rowan` dependency from `ruff_fmt` and use `rome_text_size` directly.

This is just an aesthetic cleanup that does not improve compile time because `ruff_fmt` depends on `rome_formatter` which depends on `rome_rowan`.
Adds a smoke test to `ruff_fmt` to debug formatter changes.

This PR changes the return value of `fmt` to `Result<Formatted>` to allow printing the formatted IR in the quick test. The motivation behind returning `Formatted` is that the IR represents the formatted source code. Printing it only changes its representation from the IR to a string.
It's nice to be able to track tests that are "almost" passing, modulo string normalization, so this PR adds a dedicated section for snippets that fit that description.
Removes all the other fixtures, runs the conversion script over all remaining cases, and removes the `.bak` files.
@charliermarsh charliermarsh enabled auto-merge (squash) February 15, 2023 04:03
@charliermarsh charliermarsh merged commit ca49b00 into main Feb 15, 2023
@charliermarsh charliermarsh deleted the charlie/fork branch February 15, 2023 04:06
@not-my-profile
Copy link
Contributor

ruff_formatter and ruff_python_formatter

I find these crate names to be very confusing ... what is the difference?

@charliermarsh
Copy link
Member Author

The former is language-agnostic formatting infrastructure. The latter is a Python source code formatter.

@not-my-profile
Copy link
Contributor

Ah right looking at the rome crates I see that there are several language-specific _formatter crates, all depending on rome_formatter. Do you think there is another language ruff could end up being able to format?

@MichaReiser
Copy link
Member

Ah right looking at the rome crates I see that there are several language-specific _formatter crates, all depending on rome_formatter. Do you think there is another language ruff could end up being able to format?

I would find it interesting if ruff could format toml and JSON files, raw SQL statements in backend code, graphql queries, etc. to have the same experience as with Prettier. It doesn't just format your JavaScript code, but also other languages that are commonly embedded into JavaScript

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
formatter Related to the formatter
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants