Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize format_args placeholders without options: Display::simple_fmt #104525

Closed
wants to merge 3 commits into from

Conversation

m-ou-se
Copy link
Member

@m-ou-se m-ou-se commented Nov 17, 2022

This is part of #99012

format_args!("{}", "hello") pulls in the entire Display for str implementation, which includes the code necessary to support formatting options like padding (e.g. "{:50^}", etc.), which is quite unnecessary for the basic case of formatting with no options ("{}").

This adds a new method to the format trait: Display::simple_fmt. It is implemented by forwarding to the regular Display::fmt method, but is overridden in the Display impl of String and str to just call write_str directly, avoiding pulling in any code related to padding.

@m-ou-se m-ou-se added T-libs Relevant to the library team, which will review and decide on the PR/issue. A-fmt Area: `core::fmt` labels Nov 17, 2022
@m-ou-se m-ou-se self-assigned this Nov 17, 2022
@rustbot rustbot added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Nov 17, 2022
@m-ou-se
Copy link
Member Author

m-ou-se commented Nov 17, 2022

This basic test program shows great results with this change:

#![feature(rustc_private)]
#![feature(lang_items)]
#![feature(start)]
#![no_std]

extern crate libc;

use core::fmt::{self, Write};

struct Stdout;

impl Write for Stdout {
    fn write_str(&mut self, s: &str) -> fmt::Result {
        unsafe { libc::write(1, s.as_ptr().cast(), s.len()) };
        Ok(())
    }
}

#[start]
fn main(_: isize, _: *const *const u8) -> isize {
    let s = "world";
    writeln!(Stdout, "Hello, {}!", s).is_err() as isize
}

#[lang = "eh_personality"]
extern "C" fn eh_personality() {}

#[panic_handler]
fn panic(_info: &core::panic::PanicInfo) -> ! {
    unsafe { libc::abort() };
}

Before: A .text section of 6175 bytes

After: A .text section of 2666 bytes (A 57% reduction! ✨)

Most of the difference is that the binary no longer includes core::fmt::Formatter::pad (1064 bytes) and core::str::count::do_count_chars (1663 bytes).

@m-ou-se
Copy link
Member Author

m-ou-se commented Nov 17, 2022

For binaries that use string formatting with options anywhere, there won't be any benefit in binary size. There might still be a tiny tiny performance gain, although that's also unlikely.

But for small (e.g. embedded) programs, this can make a great difference, as the test program above shows.

@Amanieu
Copy link
Member

Amanieu commented Nov 17, 2022

Wouldn't it be simpler to have a default method on Display instead of a separate trait? This could then be extended to work with other traits such as Debug and UpperHex.

@m-ou-se
Copy link
Member Author

m-ou-se commented Nov 28, 2022

Wouldn't it be simpler to have a default method on Display instead of a separate trait? This could then be extended to work with other traits such as Debug and UpperHex.

Yes, definitely. (Depending on your definition of "simpler".) Doing it as a default method on the traits doesn't require specialization and is a better solution in various ways, but does require a bit more work to implement, as it requires more changes to the format_args!() macro and ArgumentV1 type, so I didn't do that in my draft implementation.

Now that we've verified this PR can provide a big improvement in some cases, I'll update it to do it the better way. :)

@m-ou-se m-ou-se mentioned this pull request Nov 29, 2022
6 tasks
@m-ou-se
Copy link
Member Author

m-ou-se commented Nov 29, 2022

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Nov 29, 2022
@bors
Copy link
Contributor

bors commented Nov 29, 2022

⌛ Trying commit 3d68c2139e8fb9a37960638feae986f02c4721a7 with merge c85ca3b2e89f3b73daa36b9976c48d69adbdd0cf...

@m-ou-se
Copy link
Member Author

m-ou-se commented Nov 29, 2022

It's now implemented for all format traits (Display, Debug, Binary, Octal, and so on), but that does increase the size of the vtable for those traits. Let's wait for the test results to see if that has any significant impact.

@rust-log-analyzer

This comment has been minimized.

@m-ou-se
Copy link
Member Author

m-ou-se commented Nov 29, 2022

@aDotInTheVoid this PR is failing on [rustdoc-json] src/test/rustdoc-json/traits/uses_extern_trait.rs, which says "FIXME(adotinthevoid): Theses shouldn't be here":

// FIXME(adotinthevoid): Theses shouldn't be here

What is that test for?

@bors
Copy link
Contributor

bors commented Nov 29, 2022

☀️ Try build successful - checks-actions
Build commit: c85ca3b2e89f3b73daa36b9976c48d69adbdd0cf (c85ca3b2e89f3b73daa36b9976c48d69adbdd0cf)

@rust-timer

This comment has been minimized.

@aDotInTheVoid
Copy link
Member

aDotInTheVoid commented Nov 29, 2022

TLDR: The test is fragile, and assumes Debug only has one item, #105063 fixes this, sorry.

What is that test for?

The test is checking that when rustdoc-json adds a foreign trait to the index, it also adds the methods of the trait. The fixme is because long term we don't want to add foreign traits to the index, and have the index only have local items.

The test is failing because it trys to get the fmt method from the Debug trait, and then check that their is a method called fmt with that name. But the way it gets the fmt method is by assuming it is the only one in the Debug trait only has one method. When it encounters two methods, it hits an assertion and fails.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (c85ca3b2e89f3b73daa36b9976c48d69adbdd0cf): comparison URL.

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: +S-waiting-on-review -S-waiting-on-perf +perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
1.0% [0.2%, 2.4%] 58
Regressions ❌
(secondary)
1.5% [0.3%, 2.4%] 36
Improvements ✅
(primary)
-0.3% [-1.0%, -0.2%] 24
Improvements ✅
(secondary)
-0.8% [-2.7%, -0.2%] 32
All ❌✅ (primary) 0.6% [-1.0%, 2.4%] 82

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
2.2% [1.6%, 3.0%] 5
Regressions ❌
(secondary)
3.1% [1.3%, 5.1%] 10
Improvements ✅
(primary)
-3.4% [-4.7%, -2.1%] 2
Improvements ✅
(secondary)
-1.6% [-1.6%, -1.6%] 1
All ❌✅ (primary) 0.6% [-4.7%, 3.0%] 7

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
1.7% [1.0%, 2.0%] 13
Regressions ❌
(secondary)
1.6% [1.0%, 2.0%] 17
Improvements ✅
(primary)
-1.0% [-1.2%, -0.8%] 2
Improvements ✅
(secondary)
-2.3% [-2.7%, -2.1%] 6
All ❌✅ (primary) 1.3% [-1.2%, 2.0%] 15

@rustbot rustbot added perf-regression Performance regression. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Nov 29, 2022
@m-ou-se
Copy link
Member Author

m-ou-se commented Nov 30, 2022

TLDR: The test is fragile, and assumes Debug only has one item, #105063 fixes this, sorry.

Thanks!

@m-ou-se m-ou-se changed the title Optimize format_args!("{}") for str and String. Optimize format_args placeholders without options: {Display, Debug, ..}::simple_fmt Nov 30, 2022
@m-ou-se
Copy link
Member Author

m-ou-se commented Nov 30, 2022

Regressions

There's quite a few ~1% regressions in compilation time. They're not that significant, but it would be nice to fix them.

My theory is that it takes LLVM a somewhat significant amount of time to basically optimize the default simple_fmt implementations to: simple_fmt = fmt, to use the same function (pointer) for both.

It would be nice if we had a feature for this in the language, so we can literally write = Self::fmt; as the default implementation, to make it an alias from the start. But we currently don't have that.

Maybe someone from the compiler team has some good ideas here. :)

@Amanieu
Copy link
Member

Amanieu commented Nov 30, 2022

I think this could be done as a MIR optimization when one function forwards directly to another. In that case we could just emit an LLVM symbol alias instead of emitting an inline function.

matthiaskrgr added a commit to matthiaskrgr/rust that referenced this pull request Nov 30, 2022
…, r=notriddle

Rustdoc Json Tests: Don't assume that core::fmt::Debug will always have one item.

See rust-lang#104525 (comment) and rust-lang#104525 (comment) for motivation.

This still assumes that `fmt` is the first method, but thats alot less brittle than assuming it will be the only method.

Sadly, we can't use a aux crate to insulate the tests from core changes, because core is special, so all we can do is try not to depend on things that may change.
@nnethercote
Copy link
Contributor

The binary size results show lots of 1-2% regressions, which is unfortunate for this PR which is all about reducing binary sizes :(

@m-ou-se m-ou-se removed the I-compiler-nominated Nominated for discussion during a compiler team meeting. label Apr 24, 2023
@nnethercote
Copy link
Contributor

Much better perf results now :) Still some sub-1% regressions in binary size. Not a showstopper, but a brief investigation would be good to see if they can be easily avoided.

@m-ou-se
Copy link
Member Author

m-ou-se commented Apr 25, 2023

It's a bit unnecessary that the vtable used by &dyn Display now contains both fmt and simple_fmt. If it contains fmt anyway, there's no need for simple_fmt to be included as well. Adding where Self: Sized is the usual way of excluding something from the vtable, but that won't work in this case.

Putting it in a separate trait (so, SimpleDisplay::fmt instead of Display::simple_fmt) could avoid that, but that adds other complexity and requires specialization, making it less flexible.

@m-ou-se m-ou-se marked this pull request as ready for review April 27, 2023 11:11
@rustbot

This comment was marked as off-topic.

@m-ou-se
Copy link
Member Author

m-ou-se commented Apr 27, 2023

r? compiler

@rustbot rustbot assigned petrochenkov and unassigned m-ou-se Apr 27, 2023
@m-ou-se m-ou-se added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-experimental Status: Ongoing experiment that does not require reviewing and won't be merged in its current state. labels Apr 27, 2023
@petrochenkov
Copy link
Contributor

Putting it in a separate trait (so, SimpleDisplay::fmt instead of Display::simple_fmt) could avoid that, but that adds other complexity and requires specialization, making it less flexible.

How much complexity we are talking about?
And what is the flexibility needed for?

The optimization slightly pessimizes the common case to make a difference in very specific cases like #104525 (comment).
If the complexity is not large then maybe it makes sense to not pessimize the common case.
@rustbot author

@rustbot rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Apr 27, 2023
@Kobzol
Copy link
Contributor

Kobzol commented Jul 7, 2023

Are there any situations where simple_fmt does something else than just call write_str with a dynamically produced string (e.g. String/&str) or a 'static string (e.g. bool, enums with only fieldless variants)?

I wonder if we could instead add fn as_str(&self) -> Option<&str> to Display, similar to what Arguments already exposes.

With as_str, we could better optimize preallocation of the result string when using format! or to_string, because we would know the sizes of placeholders that implement as_str. This would not be possible with simple_fmt, I think.

The lowering would then have to match on as_str for simple placeholders, and either call write_str if it returns Some, or call the normal formatting machinery if it returns None.

@bjorn3
Copy link
Member

bjorn3 commented Jul 7, 2023

.as_str() isn't possible for most types. Only strings can implement it.

The lowering would then have to match on as_str for simple placeholders, and either call write_str if it returns Some, or call the normal formatting machinery if it returns None.

That will make binary bloat even worse as the if condition can't be optimized away due to the formatting machinery using dynamic dispatch everywhere.

@Kobzol
Copy link
Contributor

Kobzol commented Jul 7, 2023

.as_str() isn't possible for most types. Only strings can implement it.

Not just strings, but also e.g. bools or enums with only fieldless variants - because the returned lifetime of Option<&str> can also be 'static (AFAIK). My guess is that a similar set of types that can implement simple_fmt can also implement as_str - that's why I would be interested to see situations where simple_fmt != write_str/as_str.

But even if this was only applicable to strings, I think that it is an important case to optimize. I have seen many situations where strings are formatted, and now that's sadly suboptimal. The flattening/inlining of format_args has helped, but still it's enough to have a string in a variable (or in a const), and then it has to go through the whole fmt machinery needlessly.

That will make binary bloat even worse as the if condition can't be optimized away due to the formatting machinery using dynamic dispatch everywhere.

Yeah, optimizing the branch would be problematic with dynamic dispatch. But I guess that this could be solved with calling a function through virtual dispatch that would do the condition inside, so that the condition is only codegened once. It would mean double indirection, but that would also happen for types having the default implementation of simple_fmt.

Something like

fn simple_fmt(obj: &Display, fmt: &mut Formatter) {
   match obj.as_str() {
     Some(s) => fmt.write_str(s),
     None => obj.fmt(fmt)
   }
}

@bjorn3
Copy link
Member

bjorn3 commented Jul 7, 2023

But I guess that this could be solved with calling a function through virtual dispatch that would do the condition inside, so that the condition is only codegened once.

That still doesn't allow eliminating the big Display::fmt implementations entirely for embedded systems where there are no uses of non-simple formatting, right?

@Kobzol
Copy link
Contributor

Kobzol commented Jul 7, 2023

That still doesn't allow eliminating the big Display::fmt implementations entirely for embedded systems where there are no uses of non-simple formatting, right?

Probably not, and that is also not the point of the proposed as_str method, my motivation was to better optimize the runtime performance of formatting (amongst other things by better preallocating strings when doing format!/to_string).

Regarding binary size, I don't have much experience with Rust embedded, so this is just an uninformed opinion, but I think that the code size wins (although they are great!) here might not translate that well into real world programs. If having fmt in the binary or not having fmt in the binary is a fundamental difference for an embedded program, then it seems quite brittle to base this on an optimization that basically breaks once you use {:?} or format something else than a string anywhere in the program. But maybe I'm wrong and there are use-cases for only formatting strings with {} everywhere in embedded programs, I'm not sure :)

@bjorn3
Copy link
Member

bjorn3 commented Jul 7, 2023

amongst other things by better preallocating strings when doing format!/to_string

Maybe add a size_hint method for that?

@Kobzol
Copy link
Contributor

Kobzol commented Jul 8, 2023

Nice, that would also work, and could be even more general, e.g. integers could return their log10 number of digits.

@JohnCSimon
Copy link
Member

@m-ou-se
ping from triage - can you post your status on this PR? There hasn't been an update in a few months. Thanks!

@Dylan-DPC
Copy link
Member

Closing this as inactive. Feel free to reöpen this pr or create a new pr if you get the time to work on this. Thanks

@Dylan-DPC Dylan-DPC closed this Mar 12, 2024
@Dylan-DPC Dylan-DPC added S-inactive Status: Inactive and waiting on the author. This is often applied to closed PRs. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Mar 12, 2024
@m-ou-se m-ou-se reopened this Jul 1, 2024
@m-ou-se m-ou-se closed this Jul 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-fmt Area: `core::fmt` S-inactive Status: Inactive and waiting on the author. This is often applied to closed PRs. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.