Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make fold standalone. #72139

Merged
merged 3 commits into from
May 20, 2020
Merged

Make fold standalone. #72139

merged 3 commits into from
May 20, 2020

Conversation

nnethercote
Copy link
Contributor

fold is currently implemented via try_fold, but implementing it
directly results in slightly less LLVM IR being generated, speeding up
compilation of some benchmarks.

r? @cuviper

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label May 12, 2020
@nnethercote
Copy link
Contributor Author

Lots of iterator functions (e.g. any, all, find, find_map, position) are implemented in terms of try_fold, and fold is currently done that way as well. As discussed in #64572, this is good for run-time performance because lots of iterators implement try_fold. But lots of iterators also implement fold, so it seems fold can be implemented more directly, as this PR does.

If necessary, I could also add fold implementations to all the iterators that don't implement it but do implement try_fold. (Cycle, StepBy, TakeWhile, etc.)

More generally, try_fold shows up as the single biggest source of LLVM IR generation, especially for clap-rs. There are many instantiations, from all, any, find, etc. And each of those actually has four instantiations per occurrence, in the following size order:

  • core::iter::traits::iterator::Iterator::try_fold
  • core::iter::traits::iterator::Iterator::any::check::{{closure}}
  • core::iter::traits::iterator::Iterator::any
  • core::iter::traits::iterator::Iterator::any::check

It would be nice to make those smaller, but I'm struggling to see how to do it while still using try_fold.

cc @bluss

@nnethercote
Copy link
Contributor Author

@bors try @rust-timer queue

@rust-timer
Copy link
Collaborator

Awaiting bors try build completion

@bors
Copy link
Contributor

bors commented May 12, 2020

⌛ Trying commit 4f8acfc491360a39675b1beadcfa977f5b780b9d with merge 84327c568fbc9f25d682e9c2f0a3b8a0971d1c11...

@andjo403
Copy link
Contributor

do not know if it is possible to do this change as it is documented to use try_fold

@RalfJung
Copy link
Member

@andjo403 it says "Most of the other (forward) methods have default implementations in terms of this one". That's not actually saying that any particular method indeed uses try_fold.

But if we go with this PR, a similar remark should likely be added to fold (and/or to the trait root so it is not lost in one of more than a dozen method docs).

@bors
Copy link
Contributor

bors commented May 12, 2020

☀️ Try build successful - checks-actions, checks-azure
Build commit: 84327c568fbc9f25d682e9c2f0a3b8a0971d1c11 (84327c568fbc9f25d682e9c2f0a3b8a0971d1c11)

@rust-timer
Copy link
Collaborator

Queued 84327c568fbc9f25d682e9c2f0a3b8a0971d1c11 with parent 09c817e, future comparison URL.

@bluss
Copy link
Member

bluss commented May 12, 2020

The vision is that try_fold is the "universal" Iterator method that all traversals can be implemented in terms of. In practice we have these extra functions and enums that cause the code size to grow.

"Fortunately" try_fold is unstable to implement, because of the Try trait. The vision thus hasn't even fully arrived to stable Rust.

To me seems like it would be acceptable to break the link between fold and try_fold; then we have two roots of traversals - fold for whole traversal and try_fold for short-circuiting traversal. I'd love to have that documented in that case, but maybe we don't want to commit to it.

As a practical note, it is much easier for users to implement fold correctly rather than try_fold, exactly because try_fold should stop short and leave the iterator resumable, so there are more pitfalls in the implementation.


The methods any, all, find, they are short-circuiting so try_fold is their natural root implementation, and that still needs a different solution. As written on the linked issue, what would happen if we special case any/all/find on the slice iterator, that could save a lot?

Could there be some wild specialization marker that we could use on collection iterators in general to opt them out of using try_fold? That sounds like a pretty crazy idea that probably doesn't work.

fn any(&mut self, f: impl FnMut(Self::Item) -> bool) -> bool {
    IteratorSpecializations::any(self, f)
}

/* Default IteratorSpecializations::any uses try_fold */
/* libcore/alloc iterators marked as non-segmented use simple loop? */

(I don't know how to do this specialization without the unstable specialization marker traits, but we can certainly demonstrate here the difference in IR generated for the slice iterator in the two implementations - playground link)

Reading the debug IR isn't so fun. No function is really innocent. Even raw pointer methods like is_null, offset or NonNull::new_unchecked are verbose.

@cuviper
Copy link
Member

cuviper commented May 12, 2020

I agree, I think it's OK to make this change specifically because Try is still unstable. If the performance run shows benefit, which I expect it will, I'm all for it.

Please give the same treatment to rfold.

If necessary, I could also add fold implementations to all the iterators that don't implement it but do implement try_fold. (Cycle, StepBy, TakeWhile, etc.)

We should audit those that currently have only try_fold, but fold doesn't always make sense.

  • Cycle is infinite, so it doesn't really make sense to fold it. I guess the implementation can loop on folding the base iterator, which may help its performance, but you probably don't want to be here in the first place.
  • StepBy seems ok to fold -- it just optimizes the first_take check.
  • For things like TakeWhile that need to exit early, we'd probably just want fold to make sure it does still go to try_fold of the inner iterator.

@rust-timer
Copy link
Collaborator

Finished benchmarking try commit 84327c568fbc9f25d682e9c2f0a3b8a0971d1c11, comparison URL.

@cuviper
Copy link
Member

cuviper commented May 12, 2020

Perf looks like a solid win, except for ripgrep-opt incr-patched: println at +6.7%.

@nnethercote
Copy link
Contributor Author

In case it's useful, here is the full list of libstd iterators that implement try_fold but don't implement fold: Cycle, StepBy, TakeWhile, MapWhile, Take, Scan, ResultShunt, RangeInclusive.

An alternative option to this PR is to go in the other direction: remove all the fold implementations and let them fall back to the default, which calls on to try_fold. That would make fold like any, all, etc. This would reduce the amount of code in libstd at the cost of potentially increased compile times.

As written on the linked issue, what would happen if we special case any/all/find on the slice iterator, that could save a lot?

Could there be some wild specialization marker that we could use on collection iterators in general to opt them out of using try_fold?

These are both interesting ideas. I will investigate them. Thanks!

@cuviper
Copy link
Member

cuviper commented May 12, 2020

In case it's useful, here is the full list of libstd iterators that implement try_fold but don't implement fold: Cycle, StepBy, TakeWhile, MapWhile, Take, Scan, ResultShunt, RangeInclusive.

RangeInclusive may be a good one for fold. The others which terminate early should probably keep forwarding fold to their own try_fold -- might want a helper fn for that, basically preserving the forwarding code that you're changing here.

An alternative option to this PR is to go in the other direction: remove all the fold implementations and let them fall back to the default, which calls on to try_fold. That would make fold like any, all, etc. This would reduce the amount of code in libstd at the cost of potentially increased compile times.

I think that would be actively harmful for adapters, since users can't implement try_fold yet. We should still try to call a user's fold where it makes sense.

@nnethercote
Copy link
Contributor Author

I just analyzed Iterator. Here is a summary.

Summary of Iterator methods
- '*' indicates important methods (as judged by me via profiling, see below)
- 'XXX' indicates possible changes

// No default implementation
* next

// Default implementation that doesn't rely on anything else
* size_hint
- nth
- by_ref

// Default implementation uses next
* try_fold
- cmp_by
- partial_cmp_by
- eq_by
- is_sorted_by

// Default implementation uses fold
* for_each
- count
- last
- fold_first

// Default implementation uses try_fold
* fold (XXX: should be implemented directly?)
* position -> try
* any (could use try_for_each)
* find (could use try_for_each)
- all (could use try_for_each)
- find_map (could use try_for_each)
- try_for_each

// Default implementation uses try_rfold
- rposition -> try_rfold

// Default implementation uses for_each (XXX: should use fold?)
- partition -> for_each
- unzip -> for_each

// Default implementation uses try_for_each (XXX: should use try_fold?)
- try_find

// Other
- partition_in_place -> find + rfind
- is_partitioned -> all + any
- max -> max_by
- min -> min_by
- max_by_key -> map + max_by
- min_by_key -> map + min_by
- max_by -> fold_first + fold
- min_by -> fold_first + fold
- cmp -> cmp_by
- partial_cmp -> partial_cmp_by
- eq -> eq_by
- ne -> eq
- lt -> partial_cmp
- le -> partial_cmp
- gt -> partial_cmp
- ge -> partial_cmp
- is_sorted -> is_sorted_by
- is_sorted_by_key -> map

// Uses an auxiliary adapter (some of those use fold, etc.)
- step_by -> StepBy
- chain -> Chain
- zip -> Zip
- map -> Map
- filter -> Filter
- filter_map -> FilterMap
- enumerate -> Enumerate
- peekable -> Peekable
- skip_while -> SkipWhile
- take_while -> TakeWhile
- map_while -> MapWhile
- skip -> Skip
- take -> Take
- scan -> Scan
- flat_map -> FlatMap
- flatten -> Flatten
- fuse -> Fuse
- inspect -> Inspect
- collect -> FromIterator
- rev -> Rev
- copied -> Copied
- cloned -> Cloned
- cycle -> Cycle
- sum -> Sum
- product -> Product

I also did some rough profiling of the rustc-perf benchmarks. The following shows the number of lines of LLVM IR associated with some of the methods, which gives an of their relative importances in relation to compile times.

749102 TOTAL (for all Iterator methods)
254224 try_fold
121461 next
83047 fold
45494 find
42836 size_hint
34061 any
25876 for_each
8578 position
4887 all
4554 find_map
3832 try_rfold
96 rfold
0 try_for_each

@bluss
Copy link
Member

bluss commented May 12, 2020

It can be fraught to change which method is called in default implementation. Not that there is any problem in the current PR with that, but if we begin talking about switching around default implementations, it's a factor to think about.

For example: If we would change any's default to use all, then that creates an infinite loop in those user iterators that for some reason implement all by calling any(!).

@nnethercote
Copy link
Contributor Author

what would happen if we special case any/all/find on the slice iterator, that could save a lot?

I've done this in #72166, it's simple and gets a lot of the potential benefit. I will return to fold and this PR tomorrow.

@nnethercote
Copy link
Contributor Author

nnethercote commented May 14, 2020

I have updated the code. I've given rfold the same treatment, added a comment to fold, and added missing fold implementations to iterators that have a try_fold.

@nnethercote
Copy link
Contributor Author

I did some local measurements and couldn't replicate the ripgrep regression. Maybe it was just a one-off. If this gets r+ I will do another perf run.

@andjo403
Copy link
Contributor

have you tested the perf together with #72166 is there any changes with this if that PR is merged?

@nnethercote
Copy link
Contributor Author

#72166 is mostly orthogonal to this.

Copy link
Member

@cuviper cuviper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, but there are a few that also need rfold now that we've changed that. I noted those where you added fold, and Skip is one more.

src/libcore/iter/range.rs Show resolved Hide resolved
src/libcore/iter/adapters/mod.rs Show resolved Hide resolved
src/libcore/iter/adapters/mod.rs Show resolved Hide resolved
`fold` is currently implemented via `try_fold`, but implementing it
directly results in slightly less LLVM IR being generated, speeding up
compilation of some benchmarks.

(And likewise for `rfold`.)

The commit adds `fold` implementations to all the iterators that lack
one but do have a `try_fold` implementation. Most of these just call the
`try_fold` implementation directly.
Many default iterator methods use `try_fold` or `fold`, and these ones
can too.
@nnethercote
Copy link
Contributor Author

@cuviper: I have added the requested rfolds.

@rust-highfive
Copy link
Collaborator

The job mingw-check of your PR failed (pretty log, raw log). Through arcane magic we have determined that the following fragments from the build log may contain information about the problem.

Click to expand the log.
##[section]Starting: Linux mingw-check
##[section]Starting: Initialize job
Agent name: 'Azure Pipelines 3'
Agent machine name: 'fv-az578'
Current agent version: '2.168.2'
##[group]Operating System
16.04.6
LTS
LTS
##[endgroup]
##[group]Virtual Environment
Environment: ubuntu-16.04
Version: 20200430.2
Included Software: /~https://github.com/actions/virtual-environments/blob/ubuntu16/20200430.2/images/linux/Ubuntu1604-README.md
##[endgroup]
Agent running as: 'vsts'
Prepare build directory.
Set build variables.
Download all required tasks.
Download all required tasks.
Downloading task: Bash (3.163.2)
Checking job knob settings.
   Knob: AgentToolsDirectory = /opt/hostedtoolcache Source: ${AGENT_TOOLSDIRECTORY} 
   Knob: AgentPerflog = /home/vsts/perflog Source: ${VSTS_AGENT_PERFLOG} 
Start tracking orphan processes.
##[section]Finishing: Initialize job
##[section]Starting: Configure Job Name
==============================================================================
---
========================== Starting Command Output ===========================
[command]/bin/bash --noprofile --norc /home/vsts/work/_temp/aa52f2bd-985c-436f-8b6e-ff835d042696.sh

##[section]Finishing: Disable git automatic line ending conversion
##[section]Starting: Checkout rust-lang/rust@refs/pull/72139/merge to s
Task         : Get sources
Description  : Get sources from a repository. Supports Git, TfsVC, and SVN repositories.
Version      : 1.0.0
Author       : Microsoft
---
##[command]git remote add origin /~https://github.com/rust-lang/rust
##[command]git config gc.auto 0
##[command]git config --get-all http./~https://github.com/rust-lang/rust.extraheader
##[command]git config --get-all http.proxy
##[command]git -c http.extraheader="AUTHORIZATION: basic ***" fetch --force --tags --prune --progress --no-recurse-submodules --depth=2 origin +refs/heads/*:refs/remotes/origin/* +refs/pull/72139/merge:refs/remotes/pull/72139/merge
---
 ---> 3adb0605cc65
Step 6/7 : ENV RUN_CHECK_WITH_PARALLEL_QUERIES 1
 ---> Using cache
 ---> 28dbc326cb7f
Step 7/7 : ENV SCRIPT python3 ../x.py test src/tools/expand-yaml-anchors &&            python3 ../x.py check --target=i686-pc-windows-gnu --host=i686-pc-windows-gnu &&            python3 ../x.py build --stage 0 src/tools/build-manifest &&            python3 ../x.py test --stage 0 src/tools/compiletest &&            python3 ../x.py test src/tools/tidy &&            python3 ../x.py doc --stage 0 src/libstd &&            /scripts/validate-toolstate.sh
 ---> 537a01811900
Successfully built 537a01811900
Successfully tagged rust-ci:latest
Built container sha256:537a018119009dc218456238dec90b5530050db1e2a1e166550c218003f6159d
---
   Compiling autocfg v0.1.7
error: variable does not need to be mutable
   --> src/libcore/iter/range.rs:765:39
    |
765 |     fn rfold<B, F>(mut self, init: B, mut f: F) -> B
    |                                       |
    |                                       help: remove this `mut`
    |
    = note: `-D unused-mut` implied by `-D warnings`
---
  local time: Mon May 18 04:52:47 UTC 2020
  network time: Mon, 18 May 2020 04:52:47 GMT
== end clock drift check ==

##[error]Bash exited with code '1'.
##[section]Finishing: Run build
##[section]Starting: Checkout rust-lang/rust@refs/pull/72139/merge to s
Task         : Get sources
Description  : Get sources from a repository. Supports Git, TfsVC, and SVN repositories.
Version      : 1.0.0
Author       : Microsoft
Author       : Microsoft
Help         : [More Information](https://go.microsoft.com/fwlink/?LinkId=798199)
==============================================================================
Cleaning any cached credential from repository: rust-lang/rust (GitHub)
##[section]Finishing: Checkout rust-lang/rust@refs/pull/72139/merge to s
Cleaning up task key
Start cleaning up orphan processes.
Terminate orphan process: pid (4790) (python)
##[section]Finishing: Finalize Job

I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact @rust-lang/infra. (Feature Requests)

@cuviper
Copy link
Member

cuviper commented May 20, 2020

@bors r+

@bors
Copy link
Contributor

bors commented May 20, 2020

📌 Commit 959bd48 has been approved by cuviper

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels May 20, 2020
@bors
Copy link
Contributor

bors commented May 20, 2020

⌛ Testing commit 959bd48 with merge 1a814c344dd1588457199109c0722ff66765a295...

@bors
Copy link
Contributor

bors commented May 20, 2020

💔 Test failed - checks-azure

@bors bors added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels May 20, 2020
@RalfJung
Copy link
Member

msys2 failure
@bors retry

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels May 20, 2020
bors added a commit to rust-lang-ci/rust that referenced this pull request May 20, 2020
Rollup of 6 pull requests

Successful merges:

 - rust-lang#71863 (Suggest fixes and add error recovery for `use foo::self`)
 - rust-lang#72139 (Make `fold` standalone.)
 - rust-lang#72275 (Continue lowering for unsupported async generator instead of returning an error.)
 - rust-lang#72361 (split_inclusive: add tracking issue number (72360))
 - rust-lang#72364 (Remove unused dependencies)
 - rust-lang#72366 (Adjust the zero check in `RawVec::grow`.)

Failed merges:

r? @ghost
@nnethercote
Copy link
Contributor Author

@bors rollup=never

Because this affects performance.

@bors bors merged commit 5c52f9f into rust-lang:master May 20, 2020
@nnethercote nnethercote deleted the standalone-fold branch May 21, 2020 01:47
@nnethercote
Copy link
Contributor Author

How did this land when some of the tests failed?

@RalfJung
Copy link
Member

It landed as part of a rollup that was created before you did rollup=never: #72378. All tests passed there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants