Implemented a priority executor for bevy_task #2167

IGreyGooI · 2021-05-14T20:29:35Z

A Priority Executor is implemented on top of the async_executor::Executor, bevy_task is modified to use PriorityExecutor instead. None of TaskPool API is modified(but should expect an API change). IoTaskPool and AsyncComputeTaskPool across all crates are replaced by ComputeTaskPool for testing purpose.

Summery of behavior of PriorityExecutor:

PriorityExecutor.run() has two states:
a. Running tasks with Priority::FinishWithinFrame task
b. Running tasks with Priority::AcrossFrame and Priority::IO tasks
State a will transition to b automatically after all dispatched Priority::FinishWithinFrame tasks are finished.
State b will transition to a after a event is received and the worker thread is not occupied by task.

This is meant to be a solution of #1907, However, more tests are needed to ensure:

Priority Executor execute all of tasks with the highest priority first before starting any lower priority task.
Priority Executor exits runtime properly.

Passed:

cargo run -p ci
cargo test --all-targets --workspace

…xecutor # Conflicts: # crates/bevy_tasks/Cargo.toml

…into priority_executor

crates/bevy_tasks/src/priority_executor.rs

NathanSWard · 2021-05-14T22:18:42Z

It would be nice to see benchmarks on the previous implementation vs this new PriorityExecutor.

crates/bevy_tasks/src/priority_executor.rs

NathanSWard · 2021-05-14T22:47:09Z

Also would love to see some PriorityExecutor specific tests in a #[cfg(test)] module :)

Co-authored-by: Nathan Ward <43621845+NathanSWard@users.noreply.github.com>

IGreyGooI · 2021-05-15T15:31:14Z

@NathanSWard Regarding the benchmark
I did some benchmark test. By benching taskpool.scope(). it appears the performance with fewer workers(4, 8workers) has regressed by 5% - 20%, and with larger worker pool(16, 32 workers) the performance is indifferent. I suspect the main difference is that async_executor::Executor::run() uses a Runner with a local cache as well as a variety of methods for finding work, but in PriorityExecutor::run(), it only uses async_executor::Executor.try_tick(), which find a task by popping the global queue.
I will try to replace async_executor::Executor.try_tick() with a Runner to see if the situation is better with fewer worker threads.

the benchmark is uploaded at /~https://github.com/IGreyGooI/bevy_task_pool_branchmark

IGreyGooI · 2021-05-15T20:30:38Z

@NathanSWard Regarding the benchmark
I did some benchmark test. By benching taskpool.scope(). it appears the performance with fewer workers(4, 8workers) has regressed by 5% - 20%, and with larger worker pool(16, 32 workers) the performance is indifferent. I suspect the main difference is that async_executor::Executor::run() uses a Runner with a local cache as well as a variety of methods for finding work, but in PriorityExecutor::run(), it only uses async_executor::Executor.try_tick(), which find a task by popping the global queue.
I will try to replace async_executor::Executor.try_tick() with a Runner to see if the situation is better with fewer worker threads.

the benchmark is uploaded at /~https://github.com/IGreyGooI/bevy_task_pool_branchmark

Runner is not exposed async_executor, and no other cheap workaround could be found.
Until we have our own executor, this could still be an improvement regarding performance on the assumption, which is the more threads that could be being utilized by solving #1907 bring more performance than the loss of using async_executor::Executor.try_tick() instead of async_executor::Executor::run().

Further improvement could be done by implementing a modified version of async_executor, or other potential workaround.

mtsr · 2021-05-15T21:11:54Z

You could test the effect by using a patched async_executor with Runner exposed. Just to see if it's worth bringing it up with the author.

NathanSWard · 2021-05-16T01:37:23Z

Until we have our own executor, this could still be an improvement regarding performance on the assumption, which is the more threads that could be being utilized by solving #1907 bring more performance than the loss of using async_executor::Executor.try_tick() instead of async_executor::Executor::run().

yep that does make sense, I'd would be nice to see some diagnostics that all available cores are being used based on the config (and that it is actually a performance improvement).
However, I don't know if we should just bring in a change that fixes some of the cases but regresses performance in others.
It may be worth creating a separate issue about bevy having it's own executor that directly supports priority scheduling and then pursuing that path instead.
However, I'm also open to other solutions 😄

IGreyGooI · 2021-05-16T01:43:02Z

I refer to the benchmark under bevy/benches/bevy_tasks. For benchmarks under overhead_par_iter/threads, which evaluate taskpool.scope() as well, I check my commit against current main, and no significant improvement or regress is noticable(the difference is close to the noise of running same benchmark twice).

Could someone else run cargo bench iter inside bevy/benches with two different commit to confirm it?

crates/bevy_tasks/src/priority_executor.rs

Co-authored-by: Nathan Ward <43621845+NathanSWard@users.noreply.github.com>

IGreyGooI · 2021-05-16T10:49:19Z

Here is the benchmark result from running cd ./benches & cargo bench iter on 73f4a9d
benchmark_bevy_main_73f4a9d.log

IGreyGooI · 2021-05-16T10:58:51Z

Here is the benchmark result from running cd ./benches & cargo bench iter on 01b46f3
benchmark_bevy_priority_executor_01b46f3.log, which is compared to benchmark_bevy_main_73f4a9d.log

alice-i-cecile · 2022-04-25T14:05:06Z

@IGreyGooI, can you comment in #2373? We've since relicensed to MIT + Apache, and I'd like to either ensure that this can be picked up or close it out.

hymm · 2022-05-16T18:10:26Z

this is potentially superceded by #4740 too.

IGreyGooI added 20 commits May 12, 2021 18:07

Initial implementation of PriorityExecutor

c8d08bf

Initial implementation of PriorityExecutor

4ac9a5d

Merge remote-tracking branch 'fork/priority_executor' into priority_e…

9dea8bd

…xecutor # Conflicts: # crates/bevy_tasks/Cargo.toml

Merge branch 'main' into priority_executor

53d50e5

Merge branch 'main' into priority_executor

3a7193c

Merge branch 'priority_executor' of /~https://github.com/IGreyGooI/bevy …

2db2034

…into priority_executor

Merge branch 'priority_executor' of /~https://github.com/IGreyGooI/bevy …

e149b45

…into priority_executor

Merge branch 'priority_executor' of /~https://github.com/IGreyGooI/bevy …

07ed6b1

…into priority_executor

Merge branch 'priority_executor' of /~https://github.com/IGreyGooI/bevy …

ca84978

…into priority_executor

Merge branch 'priority_executor' of /~https://github.com/IGreyGooI/bevy …

eeb40c0

…into priority_executor

Merge branch 'priority_executor' of /~https://github.com/IGreyGooI/bevy …

4002ee9

…into priority_executor

Merge branch 'priority_executor' of /~https://github.com/IGreyGooI/bevy …

2f247a4

…into priority_executor

Merge branch 'priority_executor' of /~https://github.com/IGreyGooI/bevy …

a921490

…into priority_executor

Merge branch 'priority_executor' of /~https://github.com/IGreyGooI/bevy …

b8f1498

…into priority_executor

Merge branch 'priority_executor' of /~https://github.com/IGreyGooI/bevy …

c0d7da7

…into priority_executor

Merge branch 'priority_executor' of /~https://github.com/IGreyGooI/bevy …

3b709c9

…into priority_executor

future::yield_now() will make the listener fail

4c49be7

solved deadlock when dropping TaskPoolInner

aa94c9b

passed pull test

d7896a1

Pass cargo run -p ci

4f3f096

IGreyGooI changed the title ~~Implemented a Priority executor for bevy_task~~ Implemented a priority executor for bevy_task May 14, 2021

alice-i-cecile added core C-Feature A new feature, making something new possible labels May 14, 2021

NathanSWard reviewed May 14, 2021

View reviewed changes