Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking Issue for #![feature(available_parallelism)] #74479

Closed
6 of 7 tasks
yoshuawuyts opened this issue Jul 18, 2020 · 39 comments · Fixed by #92632
Closed
6 of 7 tasks

Tracking Issue for #![feature(available_parallelism)] #74479

yoshuawuyts opened this issue Jul 18, 2020 · 39 comments · Fixed by #92632
Labels
A-concurrency Area: Concurrency B-unstable Blocker: Implemented in the nightly compiler and unstable. C-tracking-issue Category: An issue tracking the progress of sth. like the implementation of an RFC disposition-merge This issue / PR is in PFCP or FCP with a disposition to merge it. finished-final-comment-period The final comment period is finished for this PR / Issue. Libs-Small Libs issues that are considered "small" or self-contained Libs-Tracked Libs issues that are tracked on the team's project board. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.

Comments

@yoshuawuyts
Copy link
Member

yoshuawuyts commented Jul 18, 2020

The feature gate for the issue is #![feature(available_parallelism)].

This is a tracking issue for std::thread::available_parallelism; a portable API to determine how many threads to spawn in order to ensure a program can make use of all available parallelism available on a machine.

Public API

#![feature(available_parallelism)]
use std::thread;

fn main() -> std::io::Result<()> {
    let count = thread::available_parallelism()?.get();
    assert!(count >= 1);
    Ok(())
}

Tasks

  • Resolve discussion on naming.
  • Resolve discussion on function signature.
  • Resolve discussion on terminology.
  • Add support for externally-set limits on Linux (ref).
  • Add support for externally-set limits on Windows (ref).
  • Update documentation to remove mentions of "hardware" (ref).
  • Smooth out the docs example.

Notes on Platform-specific behavior

available_parallelism is (for now) only considered a hint. The following platform limitations exist:

Implementation history

@yoshuawuyts yoshuawuyts added the C-tracking-issue Category: An issue tracking the progress of sth. like the implementation of an RFC label Jul 18, 2020
@jonas-schievink jonas-schievink added A-concurrency Area: Concurrency B-unstable Blocker: Implemented in the nightly compiler and unstable. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. labels Jul 18, 2020
@KodrAus KodrAus added Libs-Small Libs issues that are considered "small" or self-contained Libs-Tracked Libs issues that are tracked on the team's project board. labels Jul 29, 2020
@yoshuawuyts yoshuawuyts changed the title Tracking Issue for #![feature(hardware_threads)] Tracking Issue for #![feature(available_concurrency)] Sep 15, 2020
@yoshuawuyts
Copy link
Member Author

yoshuawuyts commented Sep 16, 2020

A fix for the linux undercounting exists in num_cpus which is license-compatible with the compiler. I believe the fix boils down to including this conditional for the linux detection code:

if unsafe { libc::sched_getaffinity(0, mem::size_of::<libc::cpu_set_t>(), &mut set) } == 0 {
    let mut count: u32 = 0;
    for i in 0..libc::CPU_SETSIZE as usize {
        if unsafe { libc::CPU_ISSET(i, &set) } {
            count += 1
        }
    }
    count as usize
} else {
    // `libc::_SC_NPROCESSORS_ONLN` code we're currently using
}

cc/ @luser

@the8472
Copy link
Member

the8472 commented Sep 16, 2020

@yoshuawuyts that would be insufficient due to cgroup CPU quotas which gets your entire cgroup frozen until the next time slice when the quota is used up. docker and other containerization services makes heavy use of that for resource allocation.
One can and other runtimes (e.g. java) do calculate the equivalent of available full-time cpu-cores per time slice from the quotas.

see https://www.kernel.org/doc/Documentation/scheduler/sched-bwc.txt

JohnTitor added a commit to JohnTitor/rust that referenced this issue Oct 15, 2020
…lnay

Add std::thread::available_concurrency

This PR adds a counterpart to [C++'s `std::thread::hardware_concurrency`](https://en.cppreference.com/w/cpp/thread/thread/hardware_concurrency) to Rust, tracking issue rust-lang#74479.

cc/ @rust-lang/libs

## Motivation

Being able to know how many hardware threads a platform supports is a core part of building multi-threaded code. In C++ 11 this has become available through the [`std::thread::hardware_concurrency`](https://en.cppreference.com/w/cpp/thread/thread/hardware_concurrency) API. Currently in Rust most of the ecosystem depends on the [`num_cpus` crate](https://docs.rs/num_cpus/1.13.0/num_cpus/) ([no.35 in top 500 crates](https://docs.google.com/spreadsheets/d/1wwahRMHG3buvnfHjmPQFU4Kyfq15oTwbfsuZpwHUKc4/edit#gid=1253069234)) to provide this functionality. This PR proposes an API to provide access to the number of hardware threads available on a given platform.

__edit (2020-07-24):__ The purpose of this PR is to provide a hint for how many threads to spawn to saturate the processor. There's value in introducing APIs for NUMA and Windows processor groups, but those are intentionally out of scope for this PR. See: rust-lang#74480 (comment).

## Naming

Discussing the naming of the API on Zulip surfaced two options:

- `std::thread::hardware_concurrency`
- `std::thread::hardware_threads`

Both options seemed acceptable, but overall people seem to gravitate the most towards `hardware_threads`. Additionally @jonas-schievink pointed out that the "hardware threads" terminology is well-established and is used in among other the [RISC-V specification](https://riscv.org/specifications/isa-spec-pdf/) (page 20):

> A component is termed a core if it contains an independent instruction fetch unit. A RISC-V-compatible core might support multiple RISC-V-compatible __hardware threads__, or harts, through multithreading.

It's also worth noting that [the original paper introducing C++'s `std::thread` submodule](http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2320.html) unfortunately doesn't feature any discussion on the naming of `hardware_concurrency`, so we can't use that to help inform our decision here.

## Return type

An important consideration @joshtriplett brought up is that we don't want to default to `1` for platforms where the number of available threads cannot be retrieved. Instead we want to inform the users of the fact that we don't know and allow them to handle that case. Which is why this PR uses `Option<NonZeroUsize>` as its return type, where `None` is returned on platforms where we don't know the number of hardware threads available.

The reasoning for `NonZeroUsize` vs `usize` is that if the number of threads for a platform are known, they'll always be at least 1. As evidenced by the example the `NonZero*` family of APIs may currently not be the most ergonomic to use, but improving the ergonomics of them is something that I think we can address separately.

## Implementation

@Mark-Simulacrum pointed out that most of the code we wanted to expose here was already available under `libtest`. So this PR mostly moves the internal code of libtest into a public API.
bors added a commit to rust-lang-ci/rust that referenced this issue Oct 18, 2020
Add std::thread::available_concurrency

This PR adds a counterpart to [C++'s `std::thread::hardware_concurrency`](https://en.cppreference.com/w/cpp/thread/thread/hardware_concurrency) to Rust, tracking issue rust-lang#74479.

cc/ `@rust-lang/libs`

## Motivation

Being able to know how many hardware threads a platform supports is a core part of building multi-threaded code. In C++ 11 this has become available through the [`std::thread::hardware_concurrency`](https://en.cppreference.com/w/cpp/thread/thread/hardware_concurrency) API. Currently in Rust most of the ecosystem depends on the [`num_cpus` crate](https://docs.rs/num_cpus/1.13.0/num_cpus/) ([no.35 in top 500 crates](https://docs.google.com/spreadsheets/d/1wwahRMHG3buvnfHjmPQFU4Kyfq15oTwbfsuZpwHUKc4/edit#gid=1253069234)) to provide this functionality. This PR proposes an API to provide access to the number of hardware threads available on a given platform.

__edit (2020-07-24):__ The purpose of this PR is to provide a hint for how many threads to spawn to saturate the processor. There's value in introducing APIs for NUMA and Windows processor groups, but those are intentionally out of scope for this PR. See: rust-lang#74480 (comment).

## Naming

Discussing the naming of the API on Zulip surfaced two options:

- `std::thread::hardware_concurrency`
- `std::thread::hardware_threads`

Both options seemed acceptable, but overall people seem to gravitate the most towards `hardware_threads`. Additionally `@jonas-schievink` pointed out that the "hardware threads" terminology is well-established and is used in among other the [RISC-V specification](https://riscv.org/specifications/isa-spec-pdf/) (page 20):

> A component is termed a core if it contains an independent instruction fetch unit. A RISC-V-compatible core might support multiple RISC-V-compatible __hardware threads__, or harts, through multithreading.

It's also worth noting that [the original paper introducing C++'s `std::thread` submodule](http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2320.html) unfortunately doesn't feature any discussion on the naming of `hardware_concurrency`, so we can't use that to help inform our decision here.

## Return type

An important consideration `@joshtriplett` brought up is that we don't want to default to `1` for platforms where the number of available threads cannot be retrieved. Instead we want to inform the users of the fact that we don't know and allow them to handle that case. Which is why this PR uses `Option<NonZeroUsize>` as its return type, where `None` is returned on platforms where we don't know the number of hardware threads available.

The reasoning for `NonZeroUsize` vs `usize` is that if the number of threads for a platform are known, they'll always be at least 1. As evidenced by the example the `NonZero*` family of APIs may currently not be the most ergonomic to use, but improving the ergonomics of them is something that I think we can address separately.

## Implementation

`@Mark-Simulacrum` pointed out that most of the code we wanted to expose here was already available under `libtest`. So this PR mostly moves the internal code of libtest into a public API.
@strohel
Copy link

strohel commented Oct 27, 2020

I would suggest that taking into account Linux a) cpusets (#74479 (comment)) and b) cgroup cpu bandwidth limits (#74479 (comment)) are considered blockers of stabilization of available_concurrency.

Rationale: the change b) in num_cpus was a source of significant CPU efficiency gains and potential memory savings in cloud-based environments. If available_concurrency is stabilized, then it is expected people will migrate to it from num_cpus. That would currently cause a regression.

This is complicated by the fact that associating "Kubernetes CPU limits" with "cgroups" and https://www.kernel.org/doc/Documentation/scheduler/sched-bwc.txt is non-trivial as there are abstraction layers between them. (so people may not be aware of this even in the presence of documentation on available_concurrency)

The fix can also be to include some num_cpus code, but the cgroups part is more substantial than the cpusets one.

@yoshuawuyts
Copy link
Member Author

yoshuawuyts commented Oct 27, 2020

@strohel I definitely agree we should support cpusets before stabilizing: we may currently over-report concurrency in containers which is something that should be fixed.

But I'm less sure about cgroup cpu bandwidth limits. If I'm understanding #74479 (comment) this would cause the reported number of CPUs to fluctuate during the runtime of the program. I believe this may cause issues when used in examples like the following:

use std::thread;

fn main() -> io::Result<()> {
    // perform init work here (connect to db, warm caches, etc.)

    let max_threads = thread::available_concurrency().map(|n| n.get()).unwrap_or(1);
    let pool = ThreadPool::init(max_threads);

    // use pool for the duration of the program
}

If the boot sequence of a program is CPU-intensive, by the time the thread pool is initialized the CPU quota may have been expended. Which means that the thread pool may set a limit lower than the max supported concurrency for the program, causing an overall under-utilization of resources.

It would seem that the right use of a bandwidth-aware API would be to periodically update the concurrency. Not to statically initialize it once since that risks under-utilization if the quota is at a low. Did I interpret this accurately?

edit: On closer reading I think I may have indeed misinterpreted. The quota value is statically set, and not constantly updated during a program's runtime. Which means it can indeed be used to calculate the max concurrency and does not need to be updated in a loop. If this is indeed the case, I agree we should account for this.

@strohel
Copy link

strohel commented Oct 27, 2020

@yoshuawuyts I see you've just edited the comment; the clarification is correct.

It is true the one can change CPU quota of a running program, but the same AFAIK holds about cpu sets. Linux even supports runtime physical CPU hotplugging. That's a non-issue, the API can do nothing about it (beyond mentioning in the docs). Caching of the value is problematic for this reason.

@the8472
Copy link
Member

the8472 commented Oct 27, 2020

If the boot sequence of a program is CPU-intensive, by the time the thread pool is initialized the CPU quota may have been expended. Which means that the thread pool may set a limit lower than the max supported concurrency for the program, causing an overall under-utilization of resources.

Perhaps my post was not clear. There is a quota-configuration which you can read from sysfs (as quota per interval and interval length) and the runtime depletion of the quota. The depletion is dynamic, but the configuration does not change unless someone explicitly reconfigures them. available_concurrency() would be based on the configuration, not the remaining quota available in the current time slice.

@raphaelcohn
Copy link

I'm deeply against including this in Rust's core API. Whilst conceptually it seems helpful and even useful, the technique used for Linux isn't just broken with regard to containers, it's broken in a number of other situations as well. It turns out doing this well is extremely hard. I've gone through several iterations of my own to get it 'right'. First up sysconf(_SC_NPROCESSORS_CONF) can not be relied upon eg in musl libc, as it uses the syscall sched_getaffinity. AS my notes in my linux-support crate make clear, it becomes unreliable after the sched_setaffinity syscall has been made. As well as taking into account current cgroup2 settings (eg effective cgroup hyper threads) one needs to be aware that hyper threading may have been disabled and some threads isolated from the OS (eg /~https://github.com/lemonrock/linux-support/blob/master/workspace/linux-support/src/cpu/HyperThread.rs#L171). Whether hyper threading matters or not to the count is also important; sometimes how threads share cpu caches and resources (eg /~https://github.com/lemonrock/linux-support/blob/master/workspace/linux-support/src/cpu/HyperThread.rs#L302), which NUMA nodes they are associated with, is it online etc; all have a significant impact.

Whilst wanting a count of cpus is a common request, naively just spinning up threads for all cpus, or all cpus minus 1, say, is never going to work unless one can ensure a dedicated OS instance to run on.

@raphaelcohn
Copy link

@strohel - the API can do something about CPU hotplugging. It's quite discoverable.

JohnTitor added a commit to JohnTitor/rust that referenced this issue Jun 21, 2021
…nTitor

Move `available_concurrency` implementation to `sys`

This splits out the platform-specific implementation of `available_concurrency` to the corresponding platforms under `sys`. No changes are made to the implementation.

Tidy didn't lint against this code being originally added outside of `sys` because of a bug (see rust-lang#84677), this PR also reverts the exclusion that was introduced in that bugfix.

Tracking issue of `available_concurrency`: rust-lang#74479
JohnTitor added a commit to JohnTitor/rust that referenced this issue Jun 21, 2021
…nTitor

Move `available_concurrency` implementation to `sys`

This splits out the platform-specific implementation of `available_concurrency` to the corresponding platforms under `sys`. No changes are made to the implementation.

Tidy didn't lint against this code being originally added outside of `sys` because of a bug (see rust-lang#84677), this PR also reverts the exclusion that was introduced in that bugfix.

Tracking issue of `available_concurrency`: rust-lang#74479
@ibraheemdev
Copy link
Member

Is there any reason not to call this num_cpus?

@yoshuawuyts
Copy link
Member Author

@ibraheemdev you can find a conversation that lead to the current naming here: #74480 (comment).

The short version on "why not name it num_cpus?" is because this API does not return the number of CPUs in a computer (neither logical or physical). Instead it returns the amount of "concurrency" [1] available to a process at that time.


[1]: We borrowed the word "concurrency" from C++'s hardware_concurrency API.

@joshtriplett
Copy link
Member

I submitted #89310 to implement available_concurrency using sched_getaffinity on Linux, which handles the case where the process doesn't have access to all CPUs, such as when limited via taskset or similar.

@raphaelcohn
Copy link

@joshtriplett I still think stabilizing this is a really bad idea; see my comment above on sched_setaffinity. If you really want to optimize concurrency, then I think folks should use specialized crates; for example, my linux-support crate. It's far richer and lets one handle a lot of confusing situations.

@joshtriplett
Copy link
Member

@raphaelcohn wrote:

If you really want to optimize concurrency, then I think folks should use specialized crates

Rust projects need some portable mechanism to determine a default number of threads to run, even if that's an approximation. (Projects currently use the num_cpus crate for this.) Crates that run on many platforms and need to use the full capabilities of the system's CPUs are not necessarily interested in adding various non-portable code paths to detect further nuances of the system. Some crates may wish to do a more in-depth adaptation to the exact capabilities of the system, but many crates just want a single number for "how many threads should I spawn", and it makes sense to have a portable API to return that number, without making an application have to directly deal with the system-specific nuances that might go into such a number.

The purpose of the available_concurrency() call is not to determine the number of CPUs on the system; it's to determine a reasonable default for the amount of concurrency to use. I did see your comment above about sched_setaffinity; however, if someone uses sched_setaffinity to limit the number of CPUs the process can run on, it's correct for available_concurrency() to return a correspondingly smaller number of CPUs, so that the process defaults to running a thread for each CPU that its affinity allows it to run on.

If you feel there's some additional nuance this function could be detecting to provide a better default, I'd be happy to review further pull requests that detect (for instance) additional aspects of cgroup configuration. But I do believe we should encourage people to use an API abstraction like this, to provide a reasonable default.

@raphaelcohn
Copy link

@joshtriplett I couldn't disagree more. Portability by compromise is a poor goal; most Rust developers trust the standard libraries to produce performant code (something I myself used to believe). However, I'm not going to hold this up. If this change goes in, then at the very least, could we please caution those that use it that is a simplistic solution and highlight that the underlying implementation on Linux is brittle.

Lastly: "if I feel". This is clumsy expressed and leads to a demonstration of power, not equality; clearly I do feel this way. It's clear that you want to make this change, that you have the power to make this change, and you have a majority view. I have no interest in engaging further.

@yoshuawuyts
Copy link
Member Author

Follow up to #74480 (comment)

I had a chat today with @sivadeilra today on how to approach the 64-thread limit on Windows, and whether we could detect more than 64 threads on a system by default. They mentioned that in order to do that we need to change the ambient state of a program, which a lookup function probably shouldn't do.

One option @sivadeilra mentioned we could explore is adding an API via std::os::windows::* to explicitly change the ambient state. We should probably prototype this on crates.io first though.


Follow up to #74480 (comment)

While looking through the discussion on the original PR I also realized that we weren't tracking #74480 (comment). It mentions that on Windows might be able to limit the amount of concurrency available to a program via process affinity masks / job objects. @sivadeilra do you have a sense for how we could support that? Would getting those values also interact with the ambient state?

@yoshuawuyts yoshuawuyts changed the title Tracking Issue for #![feature(available_concurrency)] Tracking Issue for #![feature(available_parallelism)] Oct 6, 2021
@strohel
Copy link

strohel commented Nov 20, 2021

Overestimating also costs performance, like cache contention and kernel scheduling overhead.

To support this with a data point: earlier I benchmarked a simple actix-web based microservice. Overestimating the number of worker threads (from 1 to 4) resulted in ~50% increase of CPU time per request.

I can re-run the benchmark with up-to-date versions and different parameters if requested.

@rfcbot rfcbot added the final-comment-period In the final comment period and will be merged soon unless new substantive objections are raised. label Nov 22, 2021
@rfcbot
Copy link

rfcbot commented Nov 22, 2021

🔔 This is now entering its final comment period, as per the review above. 🔔

@rfcbot rfcbot removed the proposed-final-comment-period Proposed to merge/close by relevant subteam, see T-<team> label. Will enter FCP once signed off. label Nov 22, 2021
@the8472
Copy link
Member

the8472 commented Nov 26, 2021

#91057 updates the documentation in anticipation of cgroups support.

@rfcbot rfcbot added the finished-final-comment-period The final comment period is finished for this PR / Issue. label Dec 2, 2021
@rfcbot
Copy link

rfcbot commented Dec 2, 2021

The final comment period, with a disposition to merge, as per the review above, is now complete.

As the automated representative of the governance process, I would like to thank the author for their work and everyone else who contributed.

This will be merged soon.

@rfcbot rfcbot added to-announce Announce this issue on triage meeting and removed final-comment-period In the final comment period and will be merged soon unless new substantive objections are raised. labels Dec 2, 2021
@apiraino apiraino removed the to-announce Announce this issue on triage meeting label Dec 9, 2021
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this issue Jan 8, 2022
…allelism, r=joshtriplett

Implement stabilization of `#[feature(available_parallelism)]`

Stabilized in rust-lang#74479 (comment). Closes rust-lang#74479. Thanks!

cc/ `@rust-lang/libs-api`
@bors bors closed this as completed in 1001068 Jan 8, 2022
@jethrogb
Copy link
Contributor

How was I, as a platform maintainer, supposed to know that this platform-specific functionality was added and an implementation would be needed for the platform I maintain, prior to reading about this in the release notes?

@yoshuawuyts
Copy link
Member Author

yoshuawuyts commented Feb 24, 2022

@jethrogb I'm sorry to hear you feel you should've been notified about this, but weren't. Perhaps there's indeed room for improvement here. The best place to discuss this would likely be on the libs team stream on the rust-lang Zulip. Additionally, some existing ways of tracking project changes such as these are:

I hope you find this helpful!

@yaahc
Copy link
Member

yaahc commented Feb 24, 2022

How was I, as a platform maintainer, supposed to know that this platform-specific functionality was added and an implementation would be needed for the platform I maintain, prior to reading about this in the release notes?

We've been discussing this in the libs team meetings recently, specifically the relationship between the libs teams and various platform maintainers and what kind of support we should be providing. The most recent discussion was around specific platforms and making sure that, among other things, every platform we support has a ping group so we can notify them of relevant PRs. We didn't discuss issues that affect all platforms but it seems like this is a natural extension of that conversation, and that we'd likely also need a ping group to notify all maintainers of tier3 and tier2 platforms whenever new APIs are introduced that need to be implemented by platform maintainers.

I've gone ahead and nominated this for discussion in our meeting next week. If you have any other suggestions or feedback on these plans please note them in the zulip stream for our upcoming meeting: https://rust-lang.zulipchat.com/#narrow/stream/259402-t-libs.2Fmeetings/topic/Meeting-2022-03-02, and if you're able to please feel welcome to attend the meeting and participate in the discussion when it comes up.

Dylan-DPC added a commit to Dylan-DPC/rust that referenced this issue Mar 3, 2022
Use cgroup quotas for calculating `available_parallelism`

Automated tests for this are possible but would require a bunch of assumptions. It requires root + a recent kernel, systemd and maybe docker. And even then it would need a helper binary since the test has to run in a separate process.

Limitations

* only supports cgroup v2 and assumes it's mounted under `/sys/fs/cgroup`
* procfs must be available
* the quota gets mixed into `sched_getaffinity`, so if the latter doesn't work then quota information gets ignored too

Manually tested via

```
// spawn a new cgroup scope for the current user
$ sudo systemd-run -p CPUQuota="300%" --uid=$(id -u) -tdS

// quota.rs
#![feature(available_parallelism)]
fn main() {
    println!("{:?}", std::thread::available_parallelism()); // prints Ok(3)
}
```

strace:

```
sched_getaffinity(3041643, 32, [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47]) = 32
openat(AT_FDCWD, "/proc/self/cgroup", O_RDONLY|O_CLOEXEC) = 3
statx(0, NULL, AT_STATX_SYNC_AS_STAT, STATX_ALL, NULL) = -1 EFAULT (Bad address)
statx(3, "", AT_STATX_SYNC_AS_STAT|AT_EMPTY_PATH, STATX_ALL, {stx_mask=STATX_BASIC_STATS|STATX_MNT_ID, stx_attributes=0, stx_mode=S_IFREG|0444, stx_size=0, ...}) = 0
lseek(3, 0, SEEK_CUR)                   = 0
read(3, "0::/system.slice/run-u31477.serv"..., 128) = 36
read(3, "", 92)                         = 0
close(3)                                = 0
statx(AT_FDCWD, "/sys/fs/cgroup/system.slice/run-u31477.service/cgroup.controllers", AT_STATX_SYNC_AS_STAT, STATX_ALL, {stx_mask=STATX_BASIC_STATS|STATX_MNT_ID, stx_attributes=0, stx_mode=S_IFREG|0444, stx_size=0, ...}) = 0
openat(AT_FDCWD, "/sys/fs/cgroup/system.slice/run-u31477.service/cpu.max", O_RDONLY|O_CLOEXEC) = 3
statx(3, "", AT_STATX_SYNC_AS_STAT|AT_EMPTY_PATH, STATX_ALL, {stx_mask=STATX_BASIC_STATS|STATX_MNT_ID, stx_attributes=0, stx_mode=S_IFREG|0644, stx_size=0, ...}) = 0
lseek(3, 0, SEEK_CUR)                   = 0
read(3, "300000 100000\n", 20)          = 14
read(3, "", 6)                          = 0
close(3)                                = 0
openat(AT_FDCWD, "/sys/fs/cgroup/system.slice/cpu.max", O_RDONLY|O_CLOEXEC) = 3
statx(3, "", AT_STATX_SYNC_AS_STAT|AT_EMPTY_PATH, STATX_ALL, {stx_mask=STATX_BASIC_STATS|STATX_MNT_ID, stx_attributes=0, stx_mode=S_IFREG|0644, stx_size=0, ...}) = 0
lseek(3, 0, SEEK_CUR)                   = 0
read(3, "max 100000\n", 20)             = 11
read(3, "", 9)                          = 0
close(3)                                = 0
openat(AT_FDCWD, "/sys/fs/cgroup/cpu.max", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
sched_getaffinity(0, 128, [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47]) = 40
```

r? `@joshtriplett`
cc `@yoshuawuyts`

Tracking issue and previous discussion: rust-lang#74479
Dylan-DPC added a commit to Dylan-DPC/rust that referenced this issue Mar 3, 2022
Use cgroup quotas for calculating `available_parallelism`

Automated tests for this are possible but would require a bunch of assumptions. It requires root + a recent kernel, systemd and maybe docker. And even then it would need a helper binary since the test has to run in a separate process.

Limitations

* only supports cgroup v2 and assumes it's mounted under `/sys/fs/cgroup`
* procfs must be available
* the quota gets mixed into `sched_getaffinity`, so if the latter doesn't work then quota information gets ignored too

Manually tested via

```
// spawn a new cgroup scope for the current user
$ sudo systemd-run -p CPUQuota="300%" --uid=$(id -u) -tdS

// quota.rs
#![feature(available_parallelism)]
fn main() {
    println!("{:?}", std::thread::available_parallelism()); // prints Ok(3)
}
```

strace:

```
sched_getaffinity(3041643, 32, [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47]) = 32
openat(AT_FDCWD, "/proc/self/cgroup", O_RDONLY|O_CLOEXEC) = 3
statx(0, NULL, AT_STATX_SYNC_AS_STAT, STATX_ALL, NULL) = -1 EFAULT (Bad address)
statx(3, "", AT_STATX_SYNC_AS_STAT|AT_EMPTY_PATH, STATX_ALL, {stx_mask=STATX_BASIC_STATS|STATX_MNT_ID, stx_attributes=0, stx_mode=S_IFREG|0444, stx_size=0, ...}) = 0
lseek(3, 0, SEEK_CUR)                   = 0
read(3, "0::/system.slice/run-u31477.serv"..., 128) = 36
read(3, "", 92)                         = 0
close(3)                                = 0
statx(AT_FDCWD, "/sys/fs/cgroup/system.slice/run-u31477.service/cgroup.controllers", AT_STATX_SYNC_AS_STAT, STATX_ALL, {stx_mask=STATX_BASIC_STATS|STATX_MNT_ID, stx_attributes=0, stx_mode=S_IFREG|0444, stx_size=0, ...}) = 0
openat(AT_FDCWD, "/sys/fs/cgroup/system.slice/run-u31477.service/cpu.max", O_RDONLY|O_CLOEXEC) = 3
statx(3, "", AT_STATX_SYNC_AS_STAT|AT_EMPTY_PATH, STATX_ALL, {stx_mask=STATX_BASIC_STATS|STATX_MNT_ID, stx_attributes=0, stx_mode=S_IFREG|0644, stx_size=0, ...}) = 0
lseek(3, 0, SEEK_CUR)                   = 0
read(3, "300000 100000\n", 20)          = 14
read(3, "", 6)                          = 0
close(3)                                = 0
openat(AT_FDCWD, "/sys/fs/cgroup/system.slice/cpu.max", O_RDONLY|O_CLOEXEC) = 3
statx(3, "", AT_STATX_SYNC_AS_STAT|AT_EMPTY_PATH, STATX_ALL, {stx_mask=STATX_BASIC_STATS|STATX_MNT_ID, stx_attributes=0, stx_mode=S_IFREG|0644, stx_size=0, ...}) = 0
lseek(3, 0, SEEK_CUR)                   = 0
read(3, "max 100000\n", 20)             = 11
read(3, "", 9)                          = 0
close(3)                                = 0
openat(AT_FDCWD, "/sys/fs/cgroup/cpu.max", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
sched_getaffinity(0, 128, [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47]) = 40
```

r? `````@joshtriplett`````
cc `````@yoshuawuyts`````

Tracking issue and previous discussion: rust-lang#74479
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this issue Mar 3, 2022
Use cgroup quotas for calculating `available_parallelism`

Automated tests for this are possible but would require a bunch of assumptions. It requires root + a recent kernel, systemd and maybe docker. And even then it would need a helper binary since the test has to run in a separate process.

Limitations

* only supports cgroup v2 and assumes it's mounted under `/sys/fs/cgroup`
* procfs must be available
* the quota gets mixed into `sched_getaffinity`, so if the latter doesn't work then quota information gets ignored too

Manually tested via

```
// spawn a new cgroup scope for the current user
$ sudo systemd-run -p CPUQuota="300%" --uid=$(id -u) -tdS

// quota.rs
#![feature(available_parallelism)]
fn main() {
    println!("{:?}", std::thread::available_parallelism()); // prints Ok(3)
}
```

strace:

```
sched_getaffinity(3041643, 32, [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47]) = 32
openat(AT_FDCWD, "/proc/self/cgroup", O_RDONLY|O_CLOEXEC) = 3
statx(0, NULL, AT_STATX_SYNC_AS_STAT, STATX_ALL, NULL) = -1 EFAULT (Bad address)
statx(3, "", AT_STATX_SYNC_AS_STAT|AT_EMPTY_PATH, STATX_ALL, {stx_mask=STATX_BASIC_STATS|STATX_MNT_ID, stx_attributes=0, stx_mode=S_IFREG|0444, stx_size=0, ...}) = 0
lseek(3, 0, SEEK_CUR)                   = 0
read(3, "0::/system.slice/run-u31477.serv"..., 128) = 36
read(3, "", 92)                         = 0
close(3)                                = 0
statx(AT_FDCWD, "/sys/fs/cgroup/system.slice/run-u31477.service/cgroup.controllers", AT_STATX_SYNC_AS_STAT, STATX_ALL, {stx_mask=STATX_BASIC_STATS|STATX_MNT_ID, stx_attributes=0, stx_mode=S_IFREG|0444, stx_size=0, ...}) = 0
openat(AT_FDCWD, "/sys/fs/cgroup/system.slice/run-u31477.service/cpu.max", O_RDONLY|O_CLOEXEC) = 3
statx(3, "", AT_STATX_SYNC_AS_STAT|AT_EMPTY_PATH, STATX_ALL, {stx_mask=STATX_BASIC_STATS|STATX_MNT_ID, stx_attributes=0, stx_mode=S_IFREG|0644, stx_size=0, ...}) = 0
lseek(3, 0, SEEK_CUR)                   = 0
read(3, "300000 100000\n", 20)          = 14
read(3, "", 6)                          = 0
close(3)                                = 0
openat(AT_FDCWD, "/sys/fs/cgroup/system.slice/cpu.max", O_RDONLY|O_CLOEXEC) = 3
statx(3, "", AT_STATX_SYNC_AS_STAT|AT_EMPTY_PATH, STATX_ALL, {stx_mask=STATX_BASIC_STATS|STATX_MNT_ID, stx_attributes=0, stx_mode=S_IFREG|0644, stx_size=0, ...}) = 0
lseek(3, 0, SEEK_CUR)                   = 0
read(3, "max 100000\n", 20)             = 11
read(3, "", 9)                          = 0
close(3)                                = 0
openat(AT_FDCWD, "/sys/fs/cgroup/cpu.max", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
sched_getaffinity(0, 128, [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47]) = 40
```

r? ``````@joshtriplett``````
cc ``````@yoshuawuyts``````

Tracking issue and previous discussion: rust-lang#74479
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this issue Mar 3, 2022
Use cgroup quotas for calculating `available_parallelism`

Automated tests for this are possible but would require a bunch of assumptions. It requires root + a recent kernel, systemd and maybe docker. And even then it would need a helper binary since the test has to run in a separate process.

Limitations

* only supports cgroup v2 and assumes it's mounted under `/sys/fs/cgroup`
* procfs must be available
* the quota gets mixed into `sched_getaffinity`, so if the latter doesn't work then quota information gets ignored too

Manually tested via

```
// spawn a new cgroup scope for the current user
$ sudo systemd-run -p CPUQuota="300%" --uid=$(id -u) -tdS

// quota.rs
#![feature(available_parallelism)]
fn main() {
    println!("{:?}", std::thread::available_parallelism()); // prints Ok(3)
}
```

strace:

```
sched_getaffinity(3041643, 32, [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47]) = 32
openat(AT_FDCWD, "/proc/self/cgroup", O_RDONLY|O_CLOEXEC) = 3
statx(0, NULL, AT_STATX_SYNC_AS_STAT, STATX_ALL, NULL) = -1 EFAULT (Bad address)
statx(3, "", AT_STATX_SYNC_AS_STAT|AT_EMPTY_PATH, STATX_ALL, {stx_mask=STATX_BASIC_STATS|STATX_MNT_ID, stx_attributes=0, stx_mode=S_IFREG|0444, stx_size=0, ...}) = 0
lseek(3, 0, SEEK_CUR)                   = 0
read(3, "0::/system.slice/run-u31477.serv"..., 128) = 36
read(3, "", 92)                         = 0
close(3)                                = 0
statx(AT_FDCWD, "/sys/fs/cgroup/system.slice/run-u31477.service/cgroup.controllers", AT_STATX_SYNC_AS_STAT, STATX_ALL, {stx_mask=STATX_BASIC_STATS|STATX_MNT_ID, stx_attributes=0, stx_mode=S_IFREG|0444, stx_size=0, ...}) = 0
openat(AT_FDCWD, "/sys/fs/cgroup/system.slice/run-u31477.service/cpu.max", O_RDONLY|O_CLOEXEC) = 3
statx(3, "", AT_STATX_SYNC_AS_STAT|AT_EMPTY_PATH, STATX_ALL, {stx_mask=STATX_BASIC_STATS|STATX_MNT_ID, stx_attributes=0, stx_mode=S_IFREG|0644, stx_size=0, ...}) = 0
lseek(3, 0, SEEK_CUR)                   = 0
read(3, "300000 100000\n", 20)          = 14
read(3, "", 6)                          = 0
close(3)                                = 0
openat(AT_FDCWD, "/sys/fs/cgroup/system.slice/cpu.max", O_RDONLY|O_CLOEXEC) = 3
statx(3, "", AT_STATX_SYNC_AS_STAT|AT_EMPTY_PATH, STATX_ALL, {stx_mask=STATX_BASIC_STATS|STATX_MNT_ID, stx_attributes=0, stx_mode=S_IFREG|0644, stx_size=0, ...}) = 0
lseek(3, 0, SEEK_CUR)                   = 0
read(3, "max 100000\n", 20)             = 11
read(3, "", 9)                          = 0
close(3)                                = 0
openat(AT_FDCWD, "/sys/fs/cgroup/cpu.max", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
sched_getaffinity(0, 128, [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47]) = 40
```

r? ```````@joshtriplett```````
cc ```````@yoshuawuyts```````

Tracking issue and previous discussion: rust-lang#74479
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-concurrency Area: Concurrency B-unstable Blocker: Implemented in the nightly compiler and unstable. C-tracking-issue Category: An issue tracking the progress of sth. like the implementation of an RFC disposition-merge This issue / PR is in PFCP or FCP with a disposition to merge it. finished-final-comment-period The final comment period is finished for this PR / Issue. Libs-Small Libs issues that are considered "small" or self-contained Libs-Tracked Libs issues that are tracked on the team's project board. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.