-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tracking Issue for #![feature(available_parallelism)]
#74479
Comments
A fix for the linux undercounting exists in if unsafe { libc::sched_getaffinity(0, mem::size_of::<libc::cpu_set_t>(), &mut set) } == 0 {
let mut count: u32 = 0;
for i in 0..libc::CPU_SETSIZE as usize {
if unsafe { libc::CPU_ISSET(i, &set) } {
count += 1
}
}
count as usize
} else {
// `libc::_SC_NPROCESSORS_ONLN` code we're currently using
} cc/ @luser |
@yoshuawuyts that would be insufficient due to cgroup CPU quotas which gets your entire cgroup frozen until the next time slice when the quota is used up. docker and other containerization services makes heavy use of that for resource allocation. see https://www.kernel.org/doc/Documentation/scheduler/sched-bwc.txt |
…lnay Add std::thread::available_concurrency This PR adds a counterpart to [C++'s `std::thread::hardware_concurrency`](https://en.cppreference.com/w/cpp/thread/thread/hardware_concurrency) to Rust, tracking issue rust-lang#74479. cc/ @rust-lang/libs ## Motivation Being able to know how many hardware threads a platform supports is a core part of building multi-threaded code. In C++ 11 this has become available through the [`std::thread::hardware_concurrency`](https://en.cppreference.com/w/cpp/thread/thread/hardware_concurrency) API. Currently in Rust most of the ecosystem depends on the [`num_cpus` crate](https://docs.rs/num_cpus/1.13.0/num_cpus/) ([no.35 in top 500 crates](https://docs.google.com/spreadsheets/d/1wwahRMHG3buvnfHjmPQFU4Kyfq15oTwbfsuZpwHUKc4/edit#gid=1253069234)) to provide this functionality. This PR proposes an API to provide access to the number of hardware threads available on a given platform. __edit (2020-07-24):__ The purpose of this PR is to provide a hint for how many threads to spawn to saturate the processor. There's value in introducing APIs for NUMA and Windows processor groups, but those are intentionally out of scope for this PR. See: rust-lang#74480 (comment). ## Naming Discussing the naming of the API on Zulip surfaced two options: - `std::thread::hardware_concurrency` - `std::thread::hardware_threads` Both options seemed acceptable, but overall people seem to gravitate the most towards `hardware_threads`. Additionally @jonas-schievink pointed out that the "hardware threads" terminology is well-established and is used in among other the [RISC-V specification](https://riscv.org/specifications/isa-spec-pdf/) (page 20): > A component is termed a core if it contains an independent instruction fetch unit. A RISC-V-compatible core might support multiple RISC-V-compatible __hardware threads__, or harts, through multithreading. It's also worth noting that [the original paper introducing C++'s `std::thread` submodule](http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2320.html) unfortunately doesn't feature any discussion on the naming of `hardware_concurrency`, so we can't use that to help inform our decision here. ## Return type An important consideration @joshtriplett brought up is that we don't want to default to `1` for platforms where the number of available threads cannot be retrieved. Instead we want to inform the users of the fact that we don't know and allow them to handle that case. Which is why this PR uses `Option<NonZeroUsize>` as its return type, where `None` is returned on platforms where we don't know the number of hardware threads available. The reasoning for `NonZeroUsize` vs `usize` is that if the number of threads for a platform are known, they'll always be at least 1. As evidenced by the example the `NonZero*` family of APIs may currently not be the most ergonomic to use, but improving the ergonomics of them is something that I think we can address separately. ## Implementation @Mark-Simulacrum pointed out that most of the code we wanted to expose here was already available under `libtest`. So this PR mostly moves the internal code of libtest into a public API.
Add std::thread::available_concurrency This PR adds a counterpart to [C++'s `std::thread::hardware_concurrency`](https://en.cppreference.com/w/cpp/thread/thread/hardware_concurrency) to Rust, tracking issue rust-lang#74479. cc/ `@rust-lang/libs` ## Motivation Being able to know how many hardware threads a platform supports is a core part of building multi-threaded code. In C++ 11 this has become available through the [`std::thread::hardware_concurrency`](https://en.cppreference.com/w/cpp/thread/thread/hardware_concurrency) API. Currently in Rust most of the ecosystem depends on the [`num_cpus` crate](https://docs.rs/num_cpus/1.13.0/num_cpus/) ([no.35 in top 500 crates](https://docs.google.com/spreadsheets/d/1wwahRMHG3buvnfHjmPQFU4Kyfq15oTwbfsuZpwHUKc4/edit#gid=1253069234)) to provide this functionality. This PR proposes an API to provide access to the number of hardware threads available on a given platform. __edit (2020-07-24):__ The purpose of this PR is to provide a hint for how many threads to spawn to saturate the processor. There's value in introducing APIs for NUMA and Windows processor groups, but those are intentionally out of scope for this PR. See: rust-lang#74480 (comment). ## Naming Discussing the naming of the API on Zulip surfaced two options: - `std::thread::hardware_concurrency` - `std::thread::hardware_threads` Both options seemed acceptable, but overall people seem to gravitate the most towards `hardware_threads`. Additionally `@jonas-schievink` pointed out that the "hardware threads" terminology is well-established and is used in among other the [RISC-V specification](https://riscv.org/specifications/isa-spec-pdf/) (page 20): > A component is termed a core if it contains an independent instruction fetch unit. A RISC-V-compatible core might support multiple RISC-V-compatible __hardware threads__, or harts, through multithreading. It's also worth noting that [the original paper introducing C++'s `std::thread` submodule](http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2320.html) unfortunately doesn't feature any discussion on the naming of `hardware_concurrency`, so we can't use that to help inform our decision here. ## Return type An important consideration `@joshtriplett` brought up is that we don't want to default to `1` for platforms where the number of available threads cannot be retrieved. Instead we want to inform the users of the fact that we don't know and allow them to handle that case. Which is why this PR uses `Option<NonZeroUsize>` as its return type, where `None` is returned on platforms where we don't know the number of hardware threads available. The reasoning for `NonZeroUsize` vs `usize` is that if the number of threads for a platform are known, they'll always be at least 1. As evidenced by the example the `NonZero*` family of APIs may currently not be the most ergonomic to use, but improving the ergonomics of them is something that I think we can address separately. ## Implementation `@Mark-Simulacrum` pointed out that most of the code we wanted to expose here was already available under `libtest`. So this PR mostly moves the internal code of libtest into a public API.
I would suggest that taking into account Linux a) cpusets (#74479 (comment)) and b) cgroup cpu bandwidth limits (#74479 (comment)) are considered blockers of stabilization of available_concurrency. Rationale: the change b) in num_cpus was a source of significant CPU efficiency gains and potential memory savings in cloud-based environments. If available_concurrency is stabilized, then it is expected people will migrate to it from This is complicated by the fact that associating "Kubernetes CPU limits" with "cgroups" and https://www.kernel.org/doc/Documentation/scheduler/sched-bwc.txt is non-trivial as there are abstraction layers between them. (so people may not be aware of this even in the presence of documentation on available_concurrency) The fix can also be to include some |
@strohel I definitely agree we should support cpusets before stabilizing: we may currently over-report concurrency in containers which is something that should be fixed. But I'm less sure about cgroup cpu bandwidth limits. If I'm understanding #74479 (comment) this would cause the reported number of CPUs to fluctuate during the runtime of the program. I believe this may cause issues when used in examples like the following: use std::thread;
fn main() -> io::Result<()> {
// perform init work here (connect to db, warm caches, etc.)
let max_threads = thread::available_concurrency().map(|n| n.get()).unwrap_or(1);
let pool = ThreadPool::init(max_threads);
// use pool for the duration of the program
} If the boot sequence of a program is CPU-intensive, by the time the thread pool is initialized the CPU quota may have been expended. Which means that the thread pool may set a limit lower than the max supported concurrency for the program, causing an overall under-utilization of resources. It would seem that the right use of a bandwidth-aware API would be to periodically update the concurrency. Not to statically initialize it once since that risks under-utilization if the quota is at a low. Did I interpret this accurately? edit: On closer reading I think I may have indeed misinterpreted. The |
@yoshuawuyts I see you've just edited the comment; the clarification is correct. It is true the one can change CPU quota of a running program, but the same AFAIK holds about cpu sets. Linux even supports runtime physical CPU hotplugging. That's a non-issue, the API can do nothing about it (beyond mentioning in the docs). Caching of the value is problematic for this reason. |
Perhaps my post was not clear. There is a quota-configuration which you can read from sysfs (as quota per interval and interval length) and the runtime depletion of the quota. The depletion is dynamic, but the configuration does not change unless someone explicitly reconfigures them. |
I'm deeply against including this in Rust's core API. Whilst conceptually it seems helpful and even useful, the technique used for Linux isn't just broken with regard to containers, it's broken in a number of other situations as well. It turns out doing this well is extremely hard. I've gone through several iterations of my own to get it 'right'. First up Whilst wanting a count of cpus is a common request, naively just spinning up threads for all cpus, or all cpus minus 1, say, is never going to work unless one can ensure a dedicated OS instance to run on. |
@strohel - the API can do something about CPU hotplugging. It's quite discoverable. |
…nTitor Move `available_concurrency` implementation to `sys` This splits out the platform-specific implementation of `available_concurrency` to the corresponding platforms under `sys`. No changes are made to the implementation. Tidy didn't lint against this code being originally added outside of `sys` because of a bug (see rust-lang#84677), this PR also reverts the exclusion that was introduced in that bugfix. Tracking issue of `available_concurrency`: rust-lang#74479
…nTitor Move `available_concurrency` implementation to `sys` This splits out the platform-specific implementation of `available_concurrency` to the corresponding platforms under `sys`. No changes are made to the implementation. Tidy didn't lint against this code being originally added outside of `sys` because of a bug (see rust-lang#84677), this PR also reverts the exclusion that was introduced in that bugfix. Tracking issue of `available_concurrency`: rust-lang#74479
Is there any reason not to call this |
@ibraheemdev you can find a conversation that lead to the current naming here: #74480 (comment). The short version on "why not name it [1]: We borrowed the word "concurrency" from C++'s |
I submitted #89310 to implement |
@joshtriplett I still think stabilizing this is a really bad idea; see my comment above on |
@raphaelcohn wrote:
Rust projects need some portable mechanism to determine a default number of threads to run, even if that's an approximation. (Projects currently use the The purpose of the If you feel there's some additional nuance this function could be detecting to provide a better default, I'd be happy to review further pull requests that detect (for instance) additional aspects of cgroup configuration. But I do believe we should encourage people to use an API abstraction like this, to provide a reasonable default. |
@joshtriplett I couldn't disagree more. Portability by compromise is a poor goal; most Rust developers trust the standard libraries to produce performant code (something I myself used to believe). However, I'm not going to hold this up. If this change goes in, then at the very least, could we please caution those that use it that is a simplistic solution and highlight that the underlying implementation on Linux is brittle. Lastly: "if I feel". This is clumsy expressed and leads to a demonstration of power, not equality; clearly I do feel this way. It's clear that you want to make this change, that you have the power to make this change, and you have a majority view. I have no interest in engaging further. |
Follow up to #74480 (comment) I had a chat today with @sivadeilra today on how to approach the 64-thread limit on Windows, and whether we could detect more than 64 threads on a system by default. They mentioned that in order to do that we need to change the ambient state of a program, which a lookup function probably shouldn't do. One option @sivadeilra mentioned we could explore is adding an API via Follow up to #74480 (comment) While looking through the discussion on the original PR I also realized that we weren't tracking #74480 (comment). It mentions that on Windows might be able to limit the amount of concurrency available to a program via process affinity masks / job objects. @sivadeilra do you have a sense for how we could support that? Would getting those values also interact with the ambient state? |
To support this with a data point: earlier I benchmarked a simple actix-web based microservice. Overestimating the number of worker threads (from 1 to 4) resulted in ~50% increase of CPU time per request. I can re-run the benchmark with up-to-date versions and different parameters if requested. |
🔔 This is now entering its final comment period, as per the review above. 🔔 |
#91057 updates the documentation in anticipation of cgroups support. |
The final comment period, with a disposition to merge, as per the review above, is now complete. As the automated representative of the governance process, I would like to thank the author for their work and everyone else who contributed. This will be merged soon. |
…allelism, r=joshtriplett Implement stabilization of `#[feature(available_parallelism)]` Stabilized in rust-lang#74479 (comment). Closes rust-lang#74479. Thanks! cc/ `@rust-lang/libs-api`
How was I, as a platform maintainer, supposed to know that this platform-specific functionality was added and an implementation would be needed for the platform I maintain, prior to reading about this in the release notes? |
@jethrogb I'm sorry to hear you feel you should've been notified about this, but weren't. Perhaps there's indeed room for improvement here. The best place to discuss this would likely be on the libs team stream on the rust-lang Zulip. Additionally, some existing ways of tracking project changes such as these are:
I hope you find this helpful! |
We've been discussing this in the libs team meetings recently, specifically the relationship between the libs teams and various platform maintainers and what kind of support we should be providing. The most recent discussion was around specific platforms and making sure that, among other things, every platform we support has a ping group so we can notify them of relevant PRs. We didn't discuss issues that affect all platforms but it seems like this is a natural extension of that conversation, and that we'd likely also need a ping group to notify all maintainers of tier3 and tier2 platforms whenever new APIs are introduced that need to be implemented by platform maintainers. I've gone ahead and nominated this for discussion in our meeting next week. If you have any other suggestions or feedback on these plans please note them in the zulip stream for our upcoming meeting: https://rust-lang.zulipchat.com/#narrow/stream/259402-t-libs.2Fmeetings/topic/Meeting-2022-03-02, and if you're able to please feel welcome to attend the meeting and participate in the discussion when it comes up. |
Use cgroup quotas for calculating `available_parallelism` Automated tests for this are possible but would require a bunch of assumptions. It requires root + a recent kernel, systemd and maybe docker. And even then it would need a helper binary since the test has to run in a separate process. Limitations * only supports cgroup v2 and assumes it's mounted under `/sys/fs/cgroup` * procfs must be available * the quota gets mixed into `sched_getaffinity`, so if the latter doesn't work then quota information gets ignored too Manually tested via ``` // spawn a new cgroup scope for the current user $ sudo systemd-run -p CPUQuota="300%" --uid=$(id -u) -tdS // quota.rs #![feature(available_parallelism)] fn main() { println!("{:?}", std::thread::available_parallelism()); // prints Ok(3) } ``` strace: ``` sched_getaffinity(3041643, 32, [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47]) = 32 openat(AT_FDCWD, "/proc/self/cgroup", O_RDONLY|O_CLOEXEC) = 3 statx(0, NULL, AT_STATX_SYNC_AS_STAT, STATX_ALL, NULL) = -1 EFAULT (Bad address) statx(3, "", AT_STATX_SYNC_AS_STAT|AT_EMPTY_PATH, STATX_ALL, {stx_mask=STATX_BASIC_STATS|STATX_MNT_ID, stx_attributes=0, stx_mode=S_IFREG|0444, stx_size=0, ...}) = 0 lseek(3, 0, SEEK_CUR) = 0 read(3, "0::/system.slice/run-u31477.serv"..., 128) = 36 read(3, "", 92) = 0 close(3) = 0 statx(AT_FDCWD, "/sys/fs/cgroup/system.slice/run-u31477.service/cgroup.controllers", AT_STATX_SYNC_AS_STAT, STATX_ALL, {stx_mask=STATX_BASIC_STATS|STATX_MNT_ID, stx_attributes=0, stx_mode=S_IFREG|0444, stx_size=0, ...}) = 0 openat(AT_FDCWD, "/sys/fs/cgroup/system.slice/run-u31477.service/cpu.max", O_RDONLY|O_CLOEXEC) = 3 statx(3, "", AT_STATX_SYNC_AS_STAT|AT_EMPTY_PATH, STATX_ALL, {stx_mask=STATX_BASIC_STATS|STATX_MNT_ID, stx_attributes=0, stx_mode=S_IFREG|0644, stx_size=0, ...}) = 0 lseek(3, 0, SEEK_CUR) = 0 read(3, "300000 100000\n", 20) = 14 read(3, "", 6) = 0 close(3) = 0 openat(AT_FDCWD, "/sys/fs/cgroup/system.slice/cpu.max", O_RDONLY|O_CLOEXEC) = 3 statx(3, "", AT_STATX_SYNC_AS_STAT|AT_EMPTY_PATH, STATX_ALL, {stx_mask=STATX_BASIC_STATS|STATX_MNT_ID, stx_attributes=0, stx_mode=S_IFREG|0644, stx_size=0, ...}) = 0 lseek(3, 0, SEEK_CUR) = 0 read(3, "max 100000\n", 20) = 11 read(3, "", 9) = 0 close(3) = 0 openat(AT_FDCWD, "/sys/fs/cgroup/cpu.max", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) sched_getaffinity(0, 128, [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47]) = 40 ``` r? `@joshtriplett` cc `@yoshuawuyts` Tracking issue and previous discussion: rust-lang#74479
Use cgroup quotas for calculating `available_parallelism` Automated tests for this are possible but would require a bunch of assumptions. It requires root + a recent kernel, systemd and maybe docker. And even then it would need a helper binary since the test has to run in a separate process. Limitations * only supports cgroup v2 and assumes it's mounted under `/sys/fs/cgroup` * procfs must be available * the quota gets mixed into `sched_getaffinity`, so if the latter doesn't work then quota information gets ignored too Manually tested via ``` // spawn a new cgroup scope for the current user $ sudo systemd-run -p CPUQuota="300%" --uid=$(id -u) -tdS // quota.rs #![feature(available_parallelism)] fn main() { println!("{:?}", std::thread::available_parallelism()); // prints Ok(3) } ``` strace: ``` sched_getaffinity(3041643, 32, [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47]) = 32 openat(AT_FDCWD, "/proc/self/cgroup", O_RDONLY|O_CLOEXEC) = 3 statx(0, NULL, AT_STATX_SYNC_AS_STAT, STATX_ALL, NULL) = -1 EFAULT (Bad address) statx(3, "", AT_STATX_SYNC_AS_STAT|AT_EMPTY_PATH, STATX_ALL, {stx_mask=STATX_BASIC_STATS|STATX_MNT_ID, stx_attributes=0, stx_mode=S_IFREG|0444, stx_size=0, ...}) = 0 lseek(3, 0, SEEK_CUR) = 0 read(3, "0::/system.slice/run-u31477.serv"..., 128) = 36 read(3, "", 92) = 0 close(3) = 0 statx(AT_FDCWD, "/sys/fs/cgroup/system.slice/run-u31477.service/cgroup.controllers", AT_STATX_SYNC_AS_STAT, STATX_ALL, {stx_mask=STATX_BASIC_STATS|STATX_MNT_ID, stx_attributes=0, stx_mode=S_IFREG|0444, stx_size=0, ...}) = 0 openat(AT_FDCWD, "/sys/fs/cgroup/system.slice/run-u31477.service/cpu.max", O_RDONLY|O_CLOEXEC) = 3 statx(3, "", AT_STATX_SYNC_AS_STAT|AT_EMPTY_PATH, STATX_ALL, {stx_mask=STATX_BASIC_STATS|STATX_MNT_ID, stx_attributes=0, stx_mode=S_IFREG|0644, stx_size=0, ...}) = 0 lseek(3, 0, SEEK_CUR) = 0 read(3, "300000 100000\n", 20) = 14 read(3, "", 6) = 0 close(3) = 0 openat(AT_FDCWD, "/sys/fs/cgroup/system.slice/cpu.max", O_RDONLY|O_CLOEXEC) = 3 statx(3, "", AT_STATX_SYNC_AS_STAT|AT_EMPTY_PATH, STATX_ALL, {stx_mask=STATX_BASIC_STATS|STATX_MNT_ID, stx_attributes=0, stx_mode=S_IFREG|0644, stx_size=0, ...}) = 0 lseek(3, 0, SEEK_CUR) = 0 read(3, "max 100000\n", 20) = 11 read(3, "", 9) = 0 close(3) = 0 openat(AT_FDCWD, "/sys/fs/cgroup/cpu.max", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) sched_getaffinity(0, 128, [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47]) = 40 ``` r? `````@joshtriplett````` cc `````@yoshuawuyts````` Tracking issue and previous discussion: rust-lang#74479
Use cgroup quotas for calculating `available_parallelism` Automated tests for this are possible but would require a bunch of assumptions. It requires root + a recent kernel, systemd and maybe docker. And even then it would need a helper binary since the test has to run in a separate process. Limitations * only supports cgroup v2 and assumes it's mounted under `/sys/fs/cgroup` * procfs must be available * the quota gets mixed into `sched_getaffinity`, so if the latter doesn't work then quota information gets ignored too Manually tested via ``` // spawn a new cgroup scope for the current user $ sudo systemd-run -p CPUQuota="300%" --uid=$(id -u) -tdS // quota.rs #![feature(available_parallelism)] fn main() { println!("{:?}", std::thread::available_parallelism()); // prints Ok(3) } ``` strace: ``` sched_getaffinity(3041643, 32, [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47]) = 32 openat(AT_FDCWD, "/proc/self/cgroup", O_RDONLY|O_CLOEXEC) = 3 statx(0, NULL, AT_STATX_SYNC_AS_STAT, STATX_ALL, NULL) = -1 EFAULT (Bad address) statx(3, "", AT_STATX_SYNC_AS_STAT|AT_EMPTY_PATH, STATX_ALL, {stx_mask=STATX_BASIC_STATS|STATX_MNT_ID, stx_attributes=0, stx_mode=S_IFREG|0444, stx_size=0, ...}) = 0 lseek(3, 0, SEEK_CUR) = 0 read(3, "0::/system.slice/run-u31477.serv"..., 128) = 36 read(3, "", 92) = 0 close(3) = 0 statx(AT_FDCWD, "/sys/fs/cgroup/system.slice/run-u31477.service/cgroup.controllers", AT_STATX_SYNC_AS_STAT, STATX_ALL, {stx_mask=STATX_BASIC_STATS|STATX_MNT_ID, stx_attributes=0, stx_mode=S_IFREG|0444, stx_size=0, ...}) = 0 openat(AT_FDCWD, "/sys/fs/cgroup/system.slice/run-u31477.service/cpu.max", O_RDONLY|O_CLOEXEC) = 3 statx(3, "", AT_STATX_SYNC_AS_STAT|AT_EMPTY_PATH, STATX_ALL, {stx_mask=STATX_BASIC_STATS|STATX_MNT_ID, stx_attributes=0, stx_mode=S_IFREG|0644, stx_size=0, ...}) = 0 lseek(3, 0, SEEK_CUR) = 0 read(3, "300000 100000\n", 20) = 14 read(3, "", 6) = 0 close(3) = 0 openat(AT_FDCWD, "/sys/fs/cgroup/system.slice/cpu.max", O_RDONLY|O_CLOEXEC) = 3 statx(3, "", AT_STATX_SYNC_AS_STAT|AT_EMPTY_PATH, STATX_ALL, {stx_mask=STATX_BASIC_STATS|STATX_MNT_ID, stx_attributes=0, stx_mode=S_IFREG|0644, stx_size=0, ...}) = 0 lseek(3, 0, SEEK_CUR) = 0 read(3, "max 100000\n", 20) = 11 read(3, "", 9) = 0 close(3) = 0 openat(AT_FDCWD, "/sys/fs/cgroup/cpu.max", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) sched_getaffinity(0, 128, [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47]) = 40 ``` r? ``````@joshtriplett`````` cc ``````@yoshuawuyts`````` Tracking issue and previous discussion: rust-lang#74479
Use cgroup quotas for calculating `available_parallelism` Automated tests for this are possible but would require a bunch of assumptions. It requires root + a recent kernel, systemd and maybe docker. And even then it would need a helper binary since the test has to run in a separate process. Limitations * only supports cgroup v2 and assumes it's mounted under `/sys/fs/cgroup` * procfs must be available * the quota gets mixed into `sched_getaffinity`, so if the latter doesn't work then quota information gets ignored too Manually tested via ``` // spawn a new cgroup scope for the current user $ sudo systemd-run -p CPUQuota="300%" --uid=$(id -u) -tdS // quota.rs #![feature(available_parallelism)] fn main() { println!("{:?}", std::thread::available_parallelism()); // prints Ok(3) } ``` strace: ``` sched_getaffinity(3041643, 32, [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47]) = 32 openat(AT_FDCWD, "/proc/self/cgroup", O_RDONLY|O_CLOEXEC) = 3 statx(0, NULL, AT_STATX_SYNC_AS_STAT, STATX_ALL, NULL) = -1 EFAULT (Bad address) statx(3, "", AT_STATX_SYNC_AS_STAT|AT_EMPTY_PATH, STATX_ALL, {stx_mask=STATX_BASIC_STATS|STATX_MNT_ID, stx_attributes=0, stx_mode=S_IFREG|0444, stx_size=0, ...}) = 0 lseek(3, 0, SEEK_CUR) = 0 read(3, "0::/system.slice/run-u31477.serv"..., 128) = 36 read(3, "", 92) = 0 close(3) = 0 statx(AT_FDCWD, "/sys/fs/cgroup/system.slice/run-u31477.service/cgroup.controllers", AT_STATX_SYNC_AS_STAT, STATX_ALL, {stx_mask=STATX_BASIC_STATS|STATX_MNT_ID, stx_attributes=0, stx_mode=S_IFREG|0444, stx_size=0, ...}) = 0 openat(AT_FDCWD, "/sys/fs/cgroup/system.slice/run-u31477.service/cpu.max", O_RDONLY|O_CLOEXEC) = 3 statx(3, "", AT_STATX_SYNC_AS_STAT|AT_EMPTY_PATH, STATX_ALL, {stx_mask=STATX_BASIC_STATS|STATX_MNT_ID, stx_attributes=0, stx_mode=S_IFREG|0644, stx_size=0, ...}) = 0 lseek(3, 0, SEEK_CUR) = 0 read(3, "300000 100000\n", 20) = 14 read(3, "", 6) = 0 close(3) = 0 openat(AT_FDCWD, "/sys/fs/cgroup/system.slice/cpu.max", O_RDONLY|O_CLOEXEC) = 3 statx(3, "", AT_STATX_SYNC_AS_STAT|AT_EMPTY_PATH, STATX_ALL, {stx_mask=STATX_BASIC_STATS|STATX_MNT_ID, stx_attributes=0, stx_mode=S_IFREG|0644, stx_size=0, ...}) = 0 lseek(3, 0, SEEK_CUR) = 0 read(3, "max 100000\n", 20) = 11 read(3, "", 9) = 0 close(3) = 0 openat(AT_FDCWD, "/sys/fs/cgroup/cpu.max", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory) sched_getaffinity(0, 128, [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47]) = 40 ``` r? ```````@joshtriplett``````` cc ```````@yoshuawuyts``````` Tracking issue and previous discussion: rust-lang#74479
The feature gate for the issue is
#![feature(available_parallelism)]
.This is a tracking issue for
std::thread::available_parallelism
; a portable API to determine how many threads to spawn in order to ensure a program can make use of all available parallelism available on a machine.Public API
Tasks
Notes on Platform-specific behavior
available_parallelism
is (for now) only considered a hint. The following platform limitations exist:std::thread::available_concurrency
#74480 (comment)std::thread::available_concurrency
#74480 (comment)std::thread::available_concurrency
#74480 (comment)Implementation history
std::thread::available_concurrency
#74480available_concurrency
implementation tosys
#85182std::thread::available_concurrency
support process-limited number of CPUs #89310std::thread::available_conccurrency
tostd::thread::available_parallelism
#89324available_parallelism
#92697The text was updated successfully, but these errors were encountered: