Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split symbol interner into static unsynchronized and dynamic synchronized parts #79425

Closed

Conversation

sivadeilra
Copy link

Draft PR for investigation of improving Symbol lookups

I've been working on understanding parallelism in Rust, and I noticed one opportunity for improvement. That's in how we handle interning strings as Symbols. I decided to avoid placing all "static" symbols (symbols known at compile time) in the Interner table, and instead to just assign them static indices.

The advantage of this is that you can avoid taking a lock for resolution of static symbols. I measured, and for many Rust crates, the number of symbol lookups that can be resolved statically is between 45% and 75%. So, this could significantly reduce the amount of lock contention in Symbol::intern, if/when we ever get the compiler moved to parallel builds.

I've run ./x.py test src/tests/ui, and everything passes. But there's still some cleanup left to do, which is why I'm creating this as a draft PR.

@rust-highfive
Copy link
Collaborator

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @lcnr (or someone else) soon.

If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes.

Please see the contribution instructions for more information.

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Nov 25, 2020
@Mark-Simulacrum
Copy link
Member

@bors try @rust-timer queue

@rust-timer
Copy link
Collaborator

Awaiting bors try build completion

@bors
Copy link
Contributor

bors commented Nov 25, 2020

⌛ Trying commit 1b87f079d416da1754abbd3b833e132aff5d0c7e with merge 84086bee5c514f7e2d2836aa88625bd50be1108c...

@joshtriplett
Copy link
Member

@Mark-Simulacrum CI isn't quite passing yet; I was going to queue it as soon as CI passed.

@panstromek
Copy link
Contributor

Just to note, there was a PR in this area with similar goals but different approach (packing short strings into a pointer). It wasn't merged in the end, but I thought it's worth mentioning.
#74554

@Mark-Simulacrum
Copy link
Member

The try build failed (Option 'document-private-items' given more than once), FWIW, so perf won't run until that finishes successfully - the error is pretty odd, feels like maybe an implementation bug?

@petrochenkov petrochenkov self-assigned this Nov 26, 2020
@petrochenkov
Copy link
Contributor

#74554 wasn't merged because now we access the string representation of Symbols rarely enough for it to not make much impact (at least in non-parallel case).

@jyn514
Copy link
Member

jyn514 commented Nov 27, 2020

The try build failed (Option 'document-private-items' given more than once), FWIW, so perf won't run until that finishes successfully - the error is pretty odd, feels like maybe an implementation bug?

cc #73936

@jyn514 jyn514 added I-compiletime Issue: Problems and improvements with respect to compile times. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. WG-compiler-parallel Working group: Parallelizing the compiler labels Nov 27, 2020
@jyn514 jyn514 changed the title User/ardavis/better syms Assign static indices to Symbols known at compile time Nov 27, 2020
@petrochenkov petrochenkov changed the title Assign static indices to Symbols known at compile time [WIP] Split symbol interner into static unsynchronized and dynamic synchronized parts Nov 28, 2020
@petrochenkov
Copy link
Contributor

@sivadeilra
I took some liberty to rename the PR to what I think it actually does.
Feel free to rename it again to something better.

@petrochenkov
Copy link
Contributor

petrochenkov commented Nov 28, 2020

First impressions:

  • This will need to be split into, like, dozen half-dozen of more focused PRs.
  • Each part will need to go through perf, some of them probably wont be landed due to being on the wrong side of the benefits vs complexity trade-off.
  • It would be great to use whole crates.io as the training set. Right now some of the static symbols look very domain-specific.
  • I'm not sure the script doing the training run and producing the common symbol set belongs to rustc sources (especially if it's going to analyze crates.io).

@petrochenkov petrochenkov added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Nov 28, 2020
@petrochenkov
Copy link
Contributor

cc @nnethercote
(I think you'll be interested in this.)

@petrochenkov
Copy link
Contributor

petrochenkov commented Nov 28, 2020

This will need to be split into, like, dozen of more focused PRs.

For initial PRs I'd suggest:

  • Split interner into two parts for already existing groups of symbols (kw and sym).
  • Add "common" symbols using statistics collected from crates.io (but not the script using for their collection).
  • No divide and conquer (a simple HashMap instead), no single-byte optimization.

This way we get all optimizations associated with

  • reduced locking
  • somewhat reduced dynamic allocations (which shouldn't matter much because symbols use an arena with bump allocator)

while keeping status quo in other aspects like HashMap construction and search.

@bors

This comment has been minimized.

@sivadeilra sivadeilra force-pushed the user/ardavis/better_syms branch from e1fe2c0 to f393eeb Compare December 3, 2020 18:49
@sivadeilra sivadeilra changed the title [WIP] Split symbol interner into static unsynchronized and dynamic synchronized parts Split symbol interner into static unsynchronized and dynamic synchronized parts Dec 3, 2020
@petrochenkov
Copy link
Contributor

I'll try to do some experiments with single-threaded performance on our standard benchmark in #80420.

@petrochenkov petrochenkov added S-blocked Status: Blocked on something else such as an RFC or other implementation work. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Dec 27, 2020
Dylan-DPC-zz pushed a commit to Dylan-DPC-zz/rust that referenced this pull request Dec 28, 2020
…sper

rustc_span: Remove `Symbol::with`

A subset of rust-lang#79425 that is a pure refactoring.
@petrochenkov
Copy link
Contributor

Ok, I think I know how to mitigate the regressions.

  • Use FxHashMap<&'static str, Symbol> for both static and dynamic tables.
  • Use HashMap::raw_entry to avoid hashing the string twice (example: 8809118).
  • Create the static table dynamically, but "before main" using ctor so it doesn't require synchronization.

Comments:

  • Basic interning algorithm for reference:
     if let Some(symbol) = static_table.get(string) {
     	symbol
     } else {
     	dynamic_table.get(string)
     }
  • Our standard benchmark works with a single-threaded compiler on Linux, together with initial-exec TLS model used by the compiler it means that access to the dynamic table is cheap, so we can assume that we are basically just searching two hash tables consequently (with some large probability, e.g. 30-50%), this is unsurprisingly slower than searching one table.
  • How to make these two searches faster? Find a common part in them and perform it once. If these two tables use a common hash table type (including a common hasher), then the common part is string hashing. The Raw Entry API allows to perform hashing only once (example: 8809118).
  • Besides that phf::Map happens to be slower than regular FxHashMap for search ([WIP] rustc_span: Symbol perf experiments #80420 (comment)), so it would be faster to use FxHashMap even without deduplicated hashing.
  • ctor is pretty un-Rusty, but we are using it with an eye on replacing it with compile-time initialization when const_heap (Tracking Issue for const_heap #79597) and const_trait_impl (Tracking issue for RFC 2632, impl const Trait for Ty and ~const (tilde const) syntax (const_trait_impl) #67792) work well enough. (If this was C++, using a global variable with a constructor would be a no-brainer here.)
  • The results with this strategy are good enough compared to other variants ([WIP] rustc_span: Symbol perf experiments #80420 (comment)), with couple of outliers like deeply-nested-async-* or clap-rs-doc. I didn't investigate why the outliers happen.

@petrochenkov petrochenkov added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-blocked Status: Blocked on something else such as an RFC or other implementation work. labels Jan 1, 2021
@JohnCSimon
Copy link
Member

JohnCSimon commented Jan 19, 2021

@sivadeilra Ping from triage can you please address the merge conflict and post the status of this PR?

@sivadeilra
Copy link
Author

@sivadeilra Ping from triage can you please address the merge conflict and post the status of this PR?

Sure, will do.

This improves performance of Symbol interning in several ways.
The main motivation of this work is to prepare rustc for
efficient parallel builds.

The Symbol lookup table (mapping from strings to Symbol numbers)
is now split into two separate tables: a static table and a
dynamic table. The static table contains strings that are known
to rustc, including all keywords and all `sym::foo` symbols.
The dynamic table contains strings that rustc discovers while
compiling, such as "my_super_obscure_function_name".

Since the static table is known at compile time (that is, when
rustc itself is being compiled), this table can be stored
entirely in static data structures. We use the `phf` crate to
generate this table; `phf` generates perfect hash functions.

This allows rustc to perform Symbol lookups for static symbols
without any multithreaded synchronization, or accessing any
dynamic data whatsoever.

I measured the percentage of static symbol lookups in many
common Rust crates, including rust/compiler, rust/library,
servo, rand, quote, syn, rust-analyzer, rayon, and rsvg.
Among these crates, between 35% and 55% of all symbol lookups
were resolved using the static lookup table.
@sivadeilra sivadeilra force-pushed the user/ardavis/better_syms branch from a4a2fc9 to 6fa8898 Compare January 20, 2021 15:01
@rust-log-analyzer
Copy link
Collaborator

The job x86_64-gnu-llvm-9 failed! Check out the build log: (web) (plain)

Click to see the possible cause of the failure (guessed by this bot)

---- [ui] ui/panic-handler/weak-lang-item.rs stdout ----
diff of stderr:

10 LL | extern crate core as other_core;
12 
12 
- error: language item required, but not found: `eh_personality`
- 
15 error: `#[panic_handler]` function required, but not found
+ 
+ error: language item required, but not found: `eh_personality`
17 error: aborting due to 3 previous errors
18 



The actual stderr differed from the expected stderr.
Actual stderr saved to /checkout/obj/build/x86_64-unknown-linux-gnu/test/ui/panic-handler/weak-lang-item/weak-lang-item.stderr
To update references, rerun the tests and pass the `--bless` flag
To only update this specific test, also pass `--test-args panic-handler/weak-lang-item.rs`
error: 1 errors occurred comparing output.
status: exit code: 1
status: exit code: 1
command: "/checkout/obj/build/x86_64-unknown-linux-gnu/stage2/bin/rustc" "/checkout/src/test/ui/panic-handler/weak-lang-item.rs" "-Zthreads=1" "--target=x86_64-unknown-linux-gnu" "--error-format" "json" "-Zui-testing" "-Zdeduplicate-diagnostics=no" "-Zemit-future-incompat-report" "--emit" "metadata" "-C" "prefer-dynamic" "--out-dir" "/checkout/obj/build/x86_64-unknown-linux-gnu/test/ui/panic-handler/weak-lang-item" "-A" "unused" "-Crpath" "-O" "-Cdebuginfo=0" "-Zunstable-options" "-Lnative=/checkout/obj/build/x86_64-unknown-linux-gnu/native/rust-test-helpers" "-L" "/checkout/obj/build/x86_64-unknown-linux-gnu/test/ui/panic-handler/weak-lang-item/auxiliary"
------------------------------------------

------------------------------------------
stderr:
stderr:
------------------------------------------
error[E0259]: the name `core` is defined multiple times
   |
LL | extern crate core;
LL | extern crate core;
   | ^^^^^^^^^^^^^^^^^^ `core` reimported here
   |
   = note: `core` must be defined only once in the type namespace of this module
help: you can use `as` to change the binding name of the import
   |
LL | extern crate core as other_core;


error: `#[panic_handler]` function required, but not found

error: language item required, but not found: `eh_personality`
error: aborting due to 3 previous errors

For more information about this error, try `rustc --explain E0259`.

---

Some tests failed in compiletest suite=ui mode=ui host=x86_64-unknown-linux-gnu target=x86_64-unknown-linux-gnu


command did not execute successfully: "/checkout/obj/build/x86_64-unknown-linux-gnu/stage0-tools-bin/compiletest" "--compile-lib-path" "/checkout/obj/build/x86_64-unknown-linux-gnu/stage2/lib" "--run-lib-path" "/checkout/obj/build/x86_64-unknown-linux-gnu/stage2/lib/rustlib/x86_64-unknown-linux-gnu/lib" "--rustc-path" "/checkout/obj/build/x86_64-unknown-linux-gnu/stage2/bin/rustc" "--src-base" "/checkout/src/test/ui" "--build-base" "/checkout/obj/build/x86_64-unknown-linux-gnu/test/ui" "--stage-id" "stage2-x86_64-unknown-linux-gnu" "--suite" "ui" "--mode" "ui" "--target" "x86_64-unknown-linux-gnu" "--host" "x86_64-unknown-linux-gnu" "--llvm-filecheck" "/usr/lib/llvm-9/bin/FileCheck" "--nodejs" "/usr/bin/node" "--host-rustcflags" "-Crpath -O -Cdebuginfo=0 -Zunstable-options  -Lnative=/checkout/obj/build/x86_64-unknown-linux-gnu/native/rust-test-helpers" "--target-rustcflags" "-Crpath -O -Cdebuginfo=0 -Zunstable-options  -Lnative=/checkout/obj/build/x86_64-unknown-linux-gnu/native/rust-test-helpers" "--docck-python" "/usr/bin/python3" "--lldb-python" "/usr/bin/python3" "--gdb" "/usr/bin/gdb" "--quiet" "--llvm-version" "9.0.0" "--llvm-components" "aarch64 aarch64asmparser aarch64codegen aarch64desc aarch64disassembler aarch64info aarch64utils aggressiveinstcombine all all-targets amdgpu amdgpuasmparser amdgpucodegen amdgpudesc amdgpudisassembler amdgpuinfo amdgpuutils analysis arm armasmparser armcodegen armdesc armdisassembler arminfo armutils asmparser asmprinter avr avrasmparser avrcodegen avrdesc avrdisassembler avrinfo binaryformat bitreader bitstreamreader bitwriter bpf bpfasmparser bpfcodegen bpfdesc bpfdisassembler bpfinfo codegen core coroutines coverage debuginfocodeview debuginfodwarf debuginfogsym debuginfomsf debuginfopdb demangle dlltooldriver engine executionengine fuzzmutate globalisel hexagon hexagonasmparser hexagoncodegen hexagondesc hexagondisassembler hexagoninfo instcombine instrumentation interpreter ipo irreader jitlink lanai lanaiasmparser lanaicodegen lanaidesc lanaidisassembler lanaiinfo libdriver lineeditor linker lto mc mca mcdisassembler mcjit mcparser mips mipsasmparser mipscodegen mipsdesc mipsdisassembler mipsinfo mirparser msp430 msp430asmparser msp430codegen msp430desc msp430disassembler msp430info native nativecodegen nvptx nvptxcodegen nvptxdesc nvptxinfo objcarcopts object objectyaml option orcjit passes perfjitevents powerpc powerpcasmparser powerpccodegen powerpcdesc powerpcdisassembler powerpcinfo profiledata remarks riscv riscvasmparser riscvcodegen riscvdesc riscvdisassembler riscvinfo riscvutils runtimedyld scalaropts selectiondag sparc sparcasmparser sparccodegen sparcdesc sparcdisassembler sparcinfo support symbolize systemz systemzasmparser systemzcodegen systemzdesc systemzdisassembler systemzinfo tablegen target textapi transformutils vectorize webassembly webassemblyasmparser webassemblycodegen webassemblydesc webassemblydisassembler webassemblyinfo windowsmanifest x86 x86asmparser x86codegen x86desc x86disassembler x86info x86utils xcore xcorecodegen xcoredesc xcoredisassembler xcoreinfo xray" "--system-llvm" "--cc" "" "--cxx" "" "--cflags" "" "--adb-path" "adb" "--adb-test-dir" "/data/tmp/work" "--android-cross-path" "" "--color" "always"


failed to run: /checkout/obj/build/bootstrap/debug/bootstrap --stage 2 test --exclude src/tools/tidy
Build completed unsuccessfully in 0:14:58

@sivadeilra
Copy link
Author

I think this design direction has merit, but it currently has one cost that is not acceptable. For single-threaded execution, it does penalize symbol lookups, because for many symbols it will query two hash tables, not just one. This cost could be reduced somewhat, but it's an inherent part of the design, so it can't be eliminated.

Since one of the big sources of symbol lookups is importing metadata from dependent crates, I've been planning on doing an experiment, where we serialize symbols differently depending on whether they are "static" (known) or dynamic. For static symbols, we can simply write the symbol index, and we don't even need to write the string. During deserialization, we would simply use the symbol index, without any need to check any interning tables. I think it's possible that this would have a measurable performance benefit, and that the benefit might outweigh the cost of the double-lookup.

@JohnCSimon, would you prefer that I abandon this PR, or change it to "draft", while I work on that experiment? It seems that this PR cannot be accepted in its current form, which I think is fine. Either that, or I could use #[cfg(feature = "static_symbols")] to gate this feature. That way, it could be committed (and so the PR wouldn't bit-rot while other changes go in), without any cost to others, while I work on this design.

@Dylan-DPC-zz Dylan-DPC-zz added S-experimental Status: Ongoing experiment that does not require reviewing and won't be merged in its current state. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Feb 14, 2021
@Dylan-DPC-zz Dylan-DPC-zz marked this pull request as draft February 14, 2021 15:49
@Dylan-DPC-zz
Copy link

@sivadeilra i would recommend closing this, working on the experiment and based on that you can reopen or create a new PR with these changes. Since the experiment will take some it would ensure that the PR doesnt stay open and rot with conflicts

@Dylan-DPC-zz Dylan-DPC-zz added the S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. label Feb 14, 2021
@sivadeilra
Copy link
Author

@Dylan-DPC Agreed. Closing this now.

@sivadeilra sivadeilra closed this Feb 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
I-compiletime Issue: Problems and improvements with respect to compile times. S-experimental Status: Ongoing experiment that does not require reviewing and won't be merged in its current state. S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. WG-compiler-parallel Working group: Parallelizing the compiler
Projects
None yet
Development

Successfully merging this pull request may close these issues.