Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove #[inline(always)] #129

Closed
therealprof opened this issue Jul 16, 2017 · 7 comments
Closed

Remove #[inline(always)] #129

therealprof opened this issue Jul 16, 2017 · 7 comments

Comments

@therealprof
Copy link
Contributor

Since I absolutely love clippy, I also wanted to run it on my embedded experiments but essentially failed to do so because I ran into tons of:

error: you have declared `#[inline(always)]` on `bit`. This is usually a bad idea

warnings and so I decided to look into whether that provides any benefit to the generated code at all and surprise it does not provide any benefit at all:

When compiling in release mode (with lto on) the only difference is that main is not inlined into reset_handler which is actually a positive since it makes debugging a whole lot easier if you have a trivially named main function you can put your breakpoint on.

In debug mode it has a slightly larger impact of around 20% but that seems to be mostly coming other skipped inlining opportunities rather any of the functions marked als always inline not being inline anymore which can probably offset by adjusting the inlining target, if necessary.

But most importantly to me, this allows me to run clippy just fine:

warning: equality checks against true are unnecessary
  --> examples/flash_systick_hsi48.rs:30:16
   |
30 |             if rcc.cr2.read().hsi48rdy().bit() == true {
   |                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ help: try simplifying it as shown: `rcc.cr2.read().hsi48rdy().bit()`
   |
   = note: #[warn(bool_comparison)] on by default
   = help: for further information visit /~https://github.com/Manishearth/rust-clippy/wiki#bool_comparison

Woohoo!

@pftbest
Copy link
Contributor

pftbest commented Jul 16, 2017

I'd rather have a guarantee that compiler will always optimize bit access. Maybe there is some way to disable this diagnostic in clippy?

@therealprof
Copy link
Contributor Author

Well, I was looking for a way but it seems that can only be done by config gating in the source (== Meh) or when integrating clippy into the crate itself (== Double Meh); part of the beauty of clippy is that one can simply run "[c|x]argo clippy" and receive some interesting insight into the code, sometimes even with nice optimisation suggestions, without any non-trivial setup...

The guarantee that you're mentioning does not exist anyways, it's just a compiler hint so the compiler might decide to ignore it anyway. And as you might know: premature optimisation is the root of all evil...

@japaric
Copy link
Member

japaric commented Jul 16, 2017

@therealprof

essentially failed to do so because I ran into tons of:

Is that from running clippy on the device crate or on a crate that depends on the device crate? If the later, why is clippy analyzing a dependency? That seems odd.

main is not inlined into reset_handler which is actually a positive since it makes debugging a whole lot easier if you have a trivially named main function you can put your breakpoint on.

Just want to point out that the unmangled "main" symbol is not your program main but a rustc generated stub with no source code information. That stub will eventually call your program main (I think it may go through the start lang item first) but you still have to step through some assembly before you get there.

In debug mode it has a slightly larger impact of around 20%

Which way is that? Removing #[inline(always)] makes the program smaller by 20%? Or does it make it larger?

@pftbest

I'd rather have a guarantee that compiler will always optimize bit access

Having a Cargo feature for optimizing dependencies even when compiling with just cargo build (cf. rust-lang/rust-roadmap-2017#1 (comment)) would help with that ... a bit. Even with this feature generic functions (device crates have a bunch of these because the API uses closures) won't get properly optimized because of how the compiler works: generic functions in libraries are converted into metadata so they don't get optimized until the last moment, i.e. when compiling the "top" crate, so they are directly affected by the compilation profile (dev vs release)


I don't recall if #[inline(always)] made this worse but compiling device crates without optimization sometimes generates so much code that the program no longer fits in the device Flash memory, i.e. sometimes your program .text+.rodata is larger than 64KB. When that happens I change the dev profile optimization level to 1; that fixes the binary size problem but makes the program undebuggable: you can't print stack variables because they have been "optimized away" (I don't recall the exact phrasing the debugger uses).

Just pointing out that unoptimized register access when using cargo build is a real problem that bloats binary size and affects the debugging experience. A proper fix may require some sort of optimization hinting in rustc. I think some of the ideas in #115 may help here as well. And if removing #[inline(always)] helps with the binary size problem then I'd be happy to remove it.

@therealprof
Copy link
Contributor Author

Is that from running clippy on the device crate or on a crate that depends on the device crate? If the later, why is clippy analyzing a dependency? That seems odd.

I'm compiling examples in the device crate.

Which way is that? Removing #[inline(always)] makes the program smaller by 20%? Or does it make it larger?

It makes the debugging info larger by about 20%, mostly due to more included detail. I didn't note any changes to the executed code at all.

And if removing #[inline(always)] helps with the binary size problem then I'd be happy to remove it.

No, unfortunately it does not, but it also doesn't make it worse. However the amount of debug info created by rustc is absolutely horrific. I've had cases where applications even failed to compile in dev mode because:

... relocation truncated to fit: R_ARM_THM_JUMP11 against `DEFAULT_HANDLER'

There're only two options to tackle this in my opinion:

  • Improve the rust compiler to stop this nonsense
  • Ensure that generated rust files (like the ones produces by svd2rust) are as lean as possible

@therealprof
Copy link
Contributor Author

therealprof commented Jul 21, 2017

So I was just revisiting this again and did a more thorough test of the 3 different options:

  1. [#inline(always)]
  2. [#inline]
  3. without explicit inline, as suggested before

I was specifically looking at generated code size and size of auxiliary files, and here are the results in order for debug:

35540	target.inline_always/thumbv6m-none-eabi/debug/deps
17452	target.inline_always/thumbv6m-none-eabi/debug/examples
34908	target.inline/thumbv6m-none-eabi/debug/deps
16384	target.inline/thumbv6m-none-eabi/debug/examples
43492	target/thumbv6m-none-eabi/debug/deps
54548	target/thumbv6m-none-eabi/debug/examples[1]
32656	target.inline_always/thumbv6m-none-eabi/release/deps
1848	target.inline_always/thumbv6m-none-eabi/release/examples
31988	target.inline/thumbv6m-none-eabi/release/deps
1848	target.inline/thumbv6m-none-eabi/release/examples
41968	target/thumbv6m-none-eabi/release/deps
1856	target/thumbv6m-none-eabi/release/examples

So in a nutshell, [#inline] creates identical release binaries to [#inline(always)] and slightly better than without while still retaining the ability to run clippy without stunts.

For debug [#inline] creates the smallest binaries and the least metadata (it also compiles way faster than without inlining.

Edit: [1] The results are actually not quite as bad but one example fails to compile leaving a 23MB temporary file around.

@therealprof
Copy link
Contributor Author

Funny, someone else(tm) had the same idea for std recently: rust-lang/rust#43367 ;)

@japaric
Copy link
Member

japaric commented Sep 20, 2017

Done in #141

@japaric japaric closed this as completed Sep 20, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants