Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce fmt's rodata usage by unzipping String and Argument arrays #16662

Merged
merged 3 commits into from
Sep 10, 2014

Conversation

pczarn
Copy link
Contributor

@pczarn pczarn commented Aug 21, 2014

Based on an observation that strings and arguments are always interleaved, thanks to #15832. Additionally optimize invocations where formatting parameters are unspecified for all arguments, e.g. "{} {:?} {:x}", by emptying the __STATIC_FMTARGS array. Next, Arguments::new replaces an empty slice with None so that passing empty __STATIC_FMTARGS generates slightly less machine code when Arguments::new is inlined. Furthermore, formatting itself treats these cases separately without making redundant copies of formatting parameters.

All in all, this adds a single mov instruction per write! in most cases. That's why code size has increased.

@brson
Copy link
Contributor

brson commented Aug 21, 2014

cc @alexcrichton

@brson
Copy link
Contributor

brson commented Aug 21, 2014

Awesome wins!

@pczarn
Copy link
Contributor Author

pczarn commented Aug 22, 2014

Example

Expansion of old println!("x is: {} foobar", 5);:

fn main() {
    let x = 5i;
    match (&x,) {
        (__arg0,) => {
            static __STATIC_FMTSTR: [Piece, ..3] = [
                String("x is: "),
                Argument(Argument {
                    position: ArgumentNext,
                    format: FormatSpec {
                        fill: ' ',
                        align: AlignUnknown,
                        flags: 0u,
                        precision: CountImplied,
                        width: CountImplied,
                    },
                },
                String(" foobar"),
            ];
            let __args_vec = &[argument(secret_string, __arg0)];
            let __args = unsafe { Arguments::new(__STATIC_FMTSTR, __args_vec) };

            println_args(&__args)
        }
    };
}

With this PR it becomes:

fn main() {
    let x = 5i;
    match (&x,) {
        (__arg0,) => {
            static __STATIC_FMTSTR: [&'static str, ..2u] = [
                "x is: ",
                " foobar"
            ];
            static __STATIC_FMTARGS: [rt::Argument<'static>, ..0u] = []; // optimized out
            let __args_vec = &[argument(secret_show, __arg0)];
            let __args = unsafe {
                Arguments::new(__STATIC_FMTSTR, __STATIC_FMTARGS, __args_vec)
            };
            println_args(&__args)
        }
    };
}

Wins

before
   text    data     bss     dec   filename
  15017   43168       8     58193 x86_64-unknown-linux-gnu/stage3/lib/libarena-4e7c5e5c.so
  66210   71260       4    137474 x86_64-unknown-linux-gnu/stage3/lib/libdebug-4e7c5e5c.so
  31659    3754       4     35417 x86_64-unknown-linux-gnu/stage3/lib/libflate-4e7c5e5c.so
  20138   45502       8     65648 x86_64-unknown-linux-gnu/stage3/lib/libfmt_macros-4e7c5e5c.so
  76981   83547       4    160532 x86_64-unknown-linux-gnu/stage3/lib/libgetopts-4e7c5e5c.so
  10269   44437       8     54714 x86_64-unknown-linux-gnu/stage3/lib/libgraphviz-4e7c5e5c.so
  33546   40371     200     74117 x86_64-unknown-linux-gnu/stage3/lib/liblog-4e7c5e5c.so
 341427  166384      24    507835 x86_64-unknown-linux-gnu/stage3/lib/libnative-4e7c5e5c.so
  47783  207122       4    254909 x86_64-unknown-linux-gnu/stage3/lib/librbml-4e7c5e5c.so
13324534 6303187     368 19628089 x86_64-unknown-linux-gnu/stage3/lib/librustc-4e7c5e5c.so
 236885  111322       4    348211 x86_64-unknown-linux-gnu/stage3/lib/librustc_back-4e7c5e5c.so
20742937 1035619   70464 21849020 x86_64-unknown-linux-gnu/stage3/lib/librustc_llvm-4e7c5e5c.so
1182102  508505    5280   1695887 x86_64-unknown-linux-gnu/stage3/lib/librustrt-4e7c5e5c.so
 242924  690433       4    933361 x86_64-unknown-linux-gnu/stage3/lib/libserialize-4e7c5e5c.so
 834141 1656905     344   2491390 x86_64-unknown-linux-gnu/stage3/lib/libstd-4e7c5e5c.so
 151137  671324       4    822465 x86_64-unknown-linux-gnu/stage3/lib/libsync-4e7c5e5c.so
3322627 3880814       8   7203449 x86_64-unknown-linux-gnu/stage3/lib/libsyntax-4e7c5e5c.so
 226532  107796       4    334332 x86_64-unknown-linux-gnu/stage3/lib/libterm-4e7c5e5c.so
  46988   66785       4    113777 x86_64-unknown-linux-gnu/stage3/lib/libtime-4e7c5e5c.so

after
   text    data     bss     dec      filename
  15472   43201       4      58677   x86_64-unknown-linux-gnu/stage3/lib/libarena-4e7c5e5c.so
  66250   68591       8     134849   x86_64-unknown-linux-gnu/stage3/lib/libdebug-4e7c5e5c.so
  31659    3752       8      35419   x86_64-unknown-linux-gnu/stage3/lib/libflate-4e7c5e5c.so
  20322   43650       4      63976   x86_64-unknown-linux-gnu/stage3/lib/libfmt_macros-4e7c5e5c.so
  78261   78171       4     156436   x86_64-unknown-linux-gnu/stage3/lib/libgetopts-4e7c5e5c.so
  10221   44019       4      54244   x86_64-unknown-linux-gnu/stage3/lib/libgraphviz-4e7c5e5c.so
  34034   33703     200      67937   x86_64-unknown-linux-gnu/stage3/lib/liblog-4e7c5e5c.so
 345751  153198      24     498973   x86_64-unknown-linux-gnu/stage3/lib/libnative-4e7c5e5c.so
  48403  196039       8     244450   x86_64-unknown-linux-gnu/stage3/lib/librbml-4e7c5e5c.so
13491702  5253560     368 18745630   x86_64-unknown-linux-gnu/stage3/lib/librustc-4e7c5e5c.so
 239161   99363       4     338528   x86_64-unknown-linux-gnu/stage3/lib/librustc_back-4e7c5e5c.so
20742865  1035050   70464 21848379   x86_64-unknown-linux-gnu/stage3/lib/librustc_llvm-4e7c5e5c.so
1186130  480401    5280    1671811   x86_64-unknown-linux-gnu/stage3/lib/librustrt-4e7c5e5c.so
 245928  668157       8     914093   x86_64-unknown-linux-gnu/stage3/lib/libserialize-4e7c5e5c.so
 838093 1597231     344    2435668   x86_64-unknown-linux-gnu/stage3/lib/libstd-4e7c5e5c.so
 152177  653764       4     805945   x86_64-unknown-linux-gnu/stage3/lib/libsync-4e7c5e5c.so
3349091 3563279       8    6912378   x86_64-unknown-linux-gnu/stage3/lib/libsyntax-4e7c5e5c.so
 229488  101380       4     330872   x86_64-unknown-linux-gnu/stage3/lib/libterm-4e7c5e5c.so
  48156   54777       4     102937   x86_64-unknown-linux-gnu/stage3/lib/libtime-4e7c5e5c.so

Difference (negative is better)

   text      data    bss     dec   filename
    455        33     -4       484 libarena-4e7c5e5c.so
     40     -2669      4     -2625 libdebug-4e7c5e5c.so
      0        -2      4         2 libflate-4e7c5e5c.so
    184     -1852     -4     -1672 libfmt_macros-4e7c5e5c.so
   1280     -5376      0     -4096 libgetopts-4e7c5e5c.so
    -48      -418     -4      -470 libgraphviz-4e7c5e5c.so
    488     -6668      0     -6180 liblog-4e7c5e5c.so
   4324    -13186      0     -8862 libnative-4e7c5e5c.so
    620    -11083      4    -10459 librbml-4e7c5e5c.so
  167168 -1049627      0   -882459 librustc-4e7c5e5c.so
   2276    -11959      0     -9683 librustc_back-4e7c5e5c.so
    -72      -569      0      -641 librustc_llvm-4e7c5e5c.so
   4028    -28104      0    -24076 librustrt-4e7c5e5c.so
   3004    -22276      4    -19268 libserialize-4e7c5e5c.so
   3952    -59674      0    -55722 libstd-4e7c5e5c.so
   1040    -17560      0    -16520 libsync-4e7c5e5c.so
  26464 -  317535      0   -291071 libsyntax-4e7c5e5c.so
   2956     -6416      0     -3460 libterm-4e7c5e5c.so
   1168    -12008      0    -10840 libtime-4e7c5e5c.so

    -1356 liballoc-4e7c5e5c.rlib
     1540 libarena-4e7c5e5c.rlib
      160 libarena-4e7c5e5c.so
   -74062 libcollections-4e7c5e5c.rlib
        0 libcompiler-rt.a
  -448244 libcore-4e7c5e5c.rlib
   -58910 libdebug-4e7c5e5c.rlib
      294 libdebug-4e7c5e5c.so
      120 libflate-4e7c5e5c.rlib
       -8 libflate-4e7c5e5c.so
   -58534 libgetopts-4e7c5e5c.rlib
    -4443 libgetopts-4e7c5e5c.so
    -1992 libglob-4e7c5e5c.rlib
     -767 libglob-4e7c5e5c.so
    -4686 libgraphviz-4e7c5e5c.rlib
     -384 libgraphviz-4e7c5e5c.so
   -39866 libgreen-4e7c5e5c.rlib
    -4547 libgreen-4e7c5e5c.so
        0 liblibc-4e7c5e5c.rlib
   -66714 liblog-4e7c5e5c.rlib
    -9066 liblog-4e7c5e5c.so
        0 libmorestack.a
  -140108 libnative-4e7c5e5c.rlib
   -10987 libnative-4e7c5e5c.so
   -56826 libnum-4e7c5e5c.rlib
    -2697 libnum-4e7c5e5c.so
   -31722 librand-4e7c5e5c.rlib
  -146418 librbml-4e7c5e5c.rlib
   -10939 librbml-4e7c5e5c.so
  -261958 libregex-4e7c5e5c.rlib
   -22534 libregex-4e7c5e5c.so
        0 librlibc-4e7c5e5c.rlib
  -187366 librustrt-4e7c5e5c.rlib
   -23967 librustrt-4e7c5e5c.so
  -291182 librustuv-4e7c5e5c.rlib
   -19400 librustuv-4e7c5e5c.so
    -4722 libsemver-4e7c5e5c.rlib
     2388 libsemver-4e7c5e5c.so
  -284034 libserialize-4e7c5e5c.rlib
   -19235 libserialize-4e7c5e5c.so
  -655282 libstd-4e7c5e5c.rlib
   -56009 libstd-4e7c5e5c.so
  -328036 libsync-4e7c5e5c.rlib
   -16513 libsync-4e7c5e5c.so
   -74328 libterm-4e7c5e5c.rlib
    -3352 libterm-4e7c5e5c.so
  -198478 libtest-4e7c5e5c.rlib
   -16490 libtest-4e7c5e5c.so
  -148164 libtime-4e7c5e5c.rlib
   -10166 libtime-4e7c5e5c.so
    -1050 libunicode-4e7c5e5c.rlib
   -38478 liburl-4e7c5e5c.rlib
    -5015 liburl-4e7c5e5c.so
   -35978 libuuid-4e7c5e5c.rlib
    -3100 libuuid-4e7c5e5c.so
 -3873611 total

DSTs could further reduce code bloat, in a simple way and without harmful indirection:

struct Pieces<'a> {
    args: &'a [Argument],
    strings: [&'a str]
}

static __STATIC_PIECES: Pieces = // ...
let __args = Arguments::new(__STATIC_PIECES, __args_vec);
// ...

I'm wondering whether to use vtables or store secret_* fn pointers in static arrays, too, if possible.

try!(formatter.run(arg));
}
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, have you measured a runtime improvement with these two paths?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not yet. By the way, these unwraps seem wrong, because formatting is used from fail!.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without a clear difference between them, could we unify the two paths? Part of the contract of calling this function is that Arguments is valid where the number of string pieces is very close to the number of argument pieces. This is always valid when generated by the syntax extension, so unwrap should be ok.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One iterates over arguments, the other over placeholder specs.

As for the improvement:

#[bench]
fn show_hashmap(b: &mut Bencher) {
    let mut map = HashMap::new();
    for i in range(100u, 500) { map.insert(i, i + 1); }
    let mut buf = [0, ..4096];
    b.iter(|| {
        let mut w = BufWriter::new(buf);
        (write!(w, "{}", map)).unwrap()
    })
}
before
test show_hashmap ... bench:     49661 ns/iter (+/- 6403)
after
test show_hashmap ... bench:     46245 ns/iter (+/- 6816)

@alexcrichton
Copy link
Member

Awesome wins @pczarn, nice work!

@huonw
Copy link
Member

huonw commented Aug 23, 2014

always interleaved

Really? Even format!("{}{}", ...)?

@pczarn
Copy link
Contributor Author

pczarn commented Aug 23, 2014

@huonw, in these cases string pieces can be empty: ["", ""]

Edit: or __STATIC_FMTSTR could be None, but I've measured that this opportunity for optimization is rare. I'll reconsider it after DST.

@@ -54,6 +54,8 @@ struct Context<'a, 'b> {

/// Collection of the compiled `rt::Piece` structures
pieces: Vec<Gc<ast::Expr>>,
str_pieces: Vec<Gc<ast::Expr>>,
all_pieces_simple: bool,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add doc comments for these fields? (e.g. explaining what 'simple' means here.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@huonw
Copy link
Member

huonw commented Aug 23, 2014

in these cases string pieces can be empty: ["", ""]

Can be? or are? I'm asking because this (adjacent formatting) seems like something that breaks your stated assumption and thus means this code is broken. E.g. println!("foo{}{}baz", 'X', 'Y') should print fooXYbaz, not fooXbazY.

@pczarn
Copy link
Contributor Author

pczarn commented Aug 23, 2014

They are, your example will print ["foo", "X", "", "Y", "baz"]. Writers can handle zero-length writes.

@huonw
Copy link
Member

huonw commented Aug 23, 2014

Ok, I don't quite see where the change to include the empty "" happened in your code. The current formatting code doesn't include them, e.g. the example I gave above expands to

            #[inline]
            #[allow(dead_code)]
            static __STATIC_FMTSTR: [::std::fmt::rt::Piece<'static>, ..4u] =
                [::std::fmt::rt::String("foo"),
                 ::std::fmt::rt::Argument(::std::fmt::rt::Argument{ ... }),
                 ::std::fmt::rt::Argument(::std::fmt::rt::Argument{ ... }),
                 ::std::fmt::rt::String("baz")];
            let __args_vec =
                &[::std::fmt::argument(::std::fmt::secret_show, __arg0),
                  ::std::fmt::argument(::std::fmt::secret_show, __arg1)];
            let __args =
                unsafe {
                    ::std::fmt::Arguments::new(__STATIC_FMTSTR, __args_vec)
                };
            ::std::io::stdio::println_args(&__args)

secret_lower_hex::<uint>(&(*self as uint), f)
let res = secret_lower_hex::<uint>(&(*self as uint), f);
f.flags = 0;
res
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change worries me and makes me realize that this is a breaking change due to the differing semantics. It didn't look like there was much of a performance improvement over just calling fmt.arg, could there perhaps be a static instance of Argument representing the default formatting?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, should I make a static instance or rather an impl of Default for Argument? Another thing that worries me is the change of the signature of public-facing Arguments::new.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either way is fine, and it's ok to not worry about Arguments::new as it shouldn't be used/public anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done with a static

@alexcrichton
Copy link
Member

Looks great, thanks @pczarn! Just a few more minor comments, and then I think this is good to go with some squashing of the intermediate commits.

@pczarn pczarn force-pushed the format-fmtstr-opt branch from 0b8cd02 to 11afef3 Compare August 25, 2014 13:34
@pczarn
Copy link
Contributor Author

pczarn commented Aug 25, 2014

This is good to go

Format specs are ignored and not stored in case they're all default.
Restore default formatting parameters during iteration.
Pass `None` instead of empty slices of format specs to take advantage
of non-nullable pointer optimization.

Generate a call to one of two functions of `fmt::Argument`.
bors added a commit that referenced this pull request Sep 9, 2014
Based on an observation that strings and arguments are always interleaved, thanks to #15832. Additionally optimize invocations where formatting parameters are unspecified for all arguments, e.g. `"{} {:?} {:x}"`, by emptying the `__STATIC_FMTARGS` array. Next, `Arguments::new` replaces an empty slice with `None` so that passing empty `__STATIC_FMTARGS` generates slightly less machine code when `Arguments::new` is inlined. Furthermore, formatting itself treats these cases separately without making redundant copies of formatting parameters.

All in all, this adds a single mov instruction per `write!` in most cases. That's why code size has increased.
@bors bors closed this Sep 10, 2014
@bors bors merged commit fcf88b8 into rust-lang:master Sep 10, 2014
matthiaskrgr pushed a commit to matthiaskrgr/rust that referenced this pull request Mar 10, 2024
Add test explorer

This PR implements the vscode testing api similar to rust-lang#14589, this time using a set of lsp extensions in order to make it useful for clients other than vscode, and make the vscode client side logic simpler (its now around ~100 line of TS code)

Fix rust-lang#3601
matthiaskrgr pushed a commit to matthiaskrgr/rust that referenced this pull request Mar 10, 2024
Some minor changes in the test explorer lsp extension

followup rust-lang#16662

cc `@nemethf` `@ShuiRuTian`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants