-
Notifications
You must be signed in to change notification settings - Fork 314
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add next_array
and collect_array
#560
Conversation
A possible enhancement might be to return trait FromArray<T, const N: usize> {
fn from_array(array: [T; N]) -> Self;
}
impl<T, const N: usize> FromArray<T, N> for [T; N] { /* .. */ }
impl<T, const N: usize> FromArray<Option<T>, N> for Option<[T; N]> { /* .. */ }
impl<T, E, const N: usize> FromArray<Result<T, E>, N> for Result<[T; N], E> { /* .. */ } In fact, I think this is highly useful because it allows things like let ints = line.split_whitespace().map(|n| n.parse());
if let Ok([x, y, z]) = ints.collect_array() {
...
} This would be completely in line with |
So I have a working implementation of the above idea here: https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=9dba690b0dfc362971635e21647a4c19. It makes this compile: fn main() {
let line = "32 -12 24";
let nums = line.split_whitespace().map(|n| n.parse::<i32>());
if let Some(Ok([x, y, z])) = nums.collect_array() {
println!("x: {} y: {} z: {}", x, y, z);
}
} It would change the interface to: trait ArrayCollectible<T>: Sized {
fn array_from_iter<I: IntoIterator<Item = T>>(iterable: I) -> Option<Self>;
}
trait Itertools: Iterator {
fn collect_array<A>(self) -> Option<A>
where
Self: Sized,
A: ArrayCollectible<Self::Item>;
} where
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi there! Thanks for this. I particularly like that you thought about a way of enabling const
-generic stuff without raising the minimum required rust version (even if I would imagine something else due to having an aversion against depending on other crates too much).
There has been some discussion recently about basically supporting not only tuples, but also arrays. I just want to make sure that we do not loose input from these discussions when actually settling with your solution:
- implement arrays and next_array #549
- Array combinations #546
- Const generics iterator candidates #547
On top of that, I think there are some changes in there that are not directly related to this issue. If you'd like to have them merged, could you possibly factor them out into separate PRs/commits?
src/next_array.rs
Outdated
@@ -0,0 +1,80 @@ | |||
use core::mem::MaybeUninit; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there was some discussion about building arrays:
@phimuemue Any update on this? |
I appreciate your effort, but unfortunately nothing substantial from my side: I changed my mind regarding |
@phimuemue Just for posterity's sake, |
@phimuemue Just checking in what the status is, I feel very strongly about the usefulness of |
Note that if you want I'll also mention |
This is a very useful feature. Today there was a thread on Reddit where the author basically asks if there's a crate that provides |
@Expurple |
I sometimes think about adding Another option I just saw: Crates can offer "nightly-only experimental API" (see https://docs.rs/arrayvec/latest/arrayvec/struct.ArrayVec.html#method.first_chunk for an example) - maybe this would help some users. I personally would lean towards |
I'm definitely not opposed to the idea but the EDIT: EDIT: Well I have some. With (My idea would be that @scottmcm Small discussion about temporarily adding |
For I can allocate some time to this next week. |
@jswrenn Please don't forget that we are discussing this on a PR that already has a working implementation without adding dependencies... |
@orlp, thanks, I had forgotten that this was a PR and not an issue when I made my reply. Still, we're talking about adding some extremely subtle unsafe code to Itertools. I'd like us to take extreme care to avoid accidentally introducing UB. A PR adding
If you can update this PR to do those things, I can see a path forward to merging it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this PR! I like the ArrayBuilder
abstraction quite a bit. As I mentioned, this will need additional documentation and testing before it can be merged. See the recent safety comments in my other project, zerocopy
for a sense of the paranoia rigor I'd like these safety comments to take.
@jswrenn I will be busy the upcoming week but I'm willing to bring this up to standards after that. If before then you could decide on whether or not to bump the MSRV to 1.51 I could include that in the rewrite. |
@orlp Think you might have time to revisit this soon? :-) |
@jswrenn Yes, I do. Have you reached a decision yet on the MSRV issue? |
I think it's time to set the MSRV to 1.51, I'm pretty sure @jswrenn and @phimuemue will agree. |
Absolutely. I'd even be fine going up to 1.55, which is nearly three years old. In my other crates, I've found that to be the lowest MSRV that users actually need. |
After quickly going though release notes, I may have missed something but I only noted two things 1.55 has over 1.51 that I considered to be of potential use for us: EDIT: 1.51 has those things over 1.43.1:
const-generics
|
Out of curiosity and slightly off-topic: What's a real reason to not update to stable Rust? Does it ever remove support for some platform or raise the system requirements dramatically? Or, put alternatively: Are there situations where someone could use cutting-edge |
Yes: Libraries that depend on itertools, but set a MSRV lower than stable. They are, of course, welcome to use an older, MSRV-compatible version of itertools, but we currently don't backport bugfixes to older versions.
Rust occasionally does remove support for platforms; e.g.: https://blog.rust-lang.org/2022/08/01/Increasing-glibc-kernel-requirements.html (The above post suggests that, conservatively, we could increase our MSRV to 1.63 without causing major problem for users. Maybe that's a good target MSRV for now?) |
/// # Panics | ||
/// | ||
/// This panics if `self.len >= N` or if `self.len == usize::MAX`. | ||
pub fn push(&mut self, value: T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if try_push
would be a useful API here
src/next_array.rs
Outdated
|
||
impl<T, const N: usize> Drop for ArrayBuilder<T, N> { | ||
fn drop(&mut self) { | ||
// Select the valid elements of `self.arr`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might add a comment noting that dropping Uninit
does nothing (hence the reason for this function existing)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
src/next_array.rs
Outdated
// of `MaybeUninit<T>` in the initialized state. | ||
// | ||
// [1]: https://doc.rust-lang.org/std/mem/union.MaybeUninit.html#layout-1 | ||
let (_, valid, _): (_, &mut [T], _) = unsafe { valid.align_to_mut::<T>() }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just curious—why not take the approach above where you use arr.map { unsafe { v.assume_init() } }
then drop that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We cannot do so here because we're operating on a dynamic — not static — number of elements. Here, the only elements initialized are the ones from index 0
to self.len
. By contrast, in ArrayBuilder::take
, elements 0
to N
(a constant) are initialized. array::map
only operates on fixed-sized arrays, not slices.
// SAFETY: Since `self.len` is 0, `self.arr` may safely contain | ||
// uninitialized elements. | ||
let arr = mem::replace(&mut self.arr, [(); N].map(|_| MaybeUninit::uninit())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could be simplified replace all of self
with ArrayBuilder::new()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't believe so. I agree, it would be ideal if we could simply write:
let Self { arr, len } = mem::replace(self, Self::new());
instead. However, because ArrayBuilder
has a non-trivial Drop
implementation, we cannot move-destructure it in that manner (doing so triggers a compilation error). Instead, we need to read and write each field individually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was going to suggest using std::ptr::read
instead of std::mem::replace
ing with [uninit(); N]
, but in release mode they seem to be optimized to the same thing (I assume the optimizer knows that writing uninit()
is a no-op), so whichever is more readable is probably better. (godbolt link)
4af66ad
to
9c33e3d
Compare
src/next_array.rs
Outdated
// PANICS: This will panic if `self.len == usize::MAX`. | ||
// SAFETY: By invariant on `self.arr`, all elements at indicies | ||
// `0..self.len` are valid. Due to the above write, the element at | ||
// `self.len` is now also valid. Consequently, all elements at indicies | ||
// `0..(self.len + 1)` are valid, and `self.len` can be safely | ||
// incremented without violating `self.arr`'s invariant. It is fine if | ||
// this increment panics, as we have not created any intermediate | ||
// invalid state. | ||
self.len = match self.len.checked_add(1) { | ||
Some(sum) => sum, | ||
None => panic!("`self.len == usize::MAX`; cannot increment `len`"), | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this panic branch unreachable? Since self.arr[self.len]
did not panic above, we therefore know that self.len < N
, meaning that self.len + 1 <= N
and thus self.len + 1
cannot overflow.
// PANICS: This will panic if `self.len == usize::MAX`. | |
// SAFETY: By invariant on `self.arr`, all elements at indicies | |
// `0..self.len` are valid. Due to the above write, the element at | |
// `self.len` is now also valid. Consequently, all elements at indicies | |
// `0..(self.len + 1)` are valid, and `self.len` can be safely | |
// incremented without violating `self.arr`'s invariant. It is fine if | |
// this increment panics, as we have not created any intermediate | |
// invalid state. | |
self.len = match self.len.checked_add(1) { | |
Some(sum) => sum, | |
None => panic!("`self.len == usize::MAX`; cannot increment `len`"), | |
}; | |
// PANICS: This cannot panic, since `self.len < N <= usize::MAX`. | |
// SAFETY: By invariant on `self.arr`, all elements at indicies | |
// `0..self.len` are valid. Due to the above write, the element at | |
// `self.len` is now also valid. Consequently, all elements at indicies | |
// `0..(self.len + 1)` are valid, and `self.len` can be safely | |
// incremented without violating `self.arr`'s invariant. | |
self.len += 1; |
@jswrenn In your refactor you've introduced two bugs, one of which is undefined behavior:
Please read my original version carefully, I intentionally used Your new code only makes a pointer to the first element, and forcibly only drops that element, regardless of whether it is initialized or not. I can't help but point out that adding 60 lines of comments and various extra steps to what used to be a correct 3 LOC + 4 lines of comments implementation did not help with preventing this bug for the writer, and for me personally as a reader it made the issue harder to spot. |
I appreciate the catch! Given that this passed CI, it looks like you've also uncovered a gap in our test coverage. I'll go ahead and add miri to our CI. Orson, you will not have me in agreement on the commenting issue. I wouldn't have to read your original version so carefully to guess at your intentions if had you had left comments making them explicit. The comments are for the benefit of maintainers, who need to make sure we don't regress on any of these subtleties in perpetuity. And the comments did result in a potential soundness issue being discovered by a reviewer — the original code leaves a live reference to an invalid (dropped) referent. In fact, the regression you just discovered was introduced as part of fixing this potential soundness issue that was present in your original PR. |
That was never part of my original PR, but also due to your refactor. My original code never left a live reference. I implore you to look at the original code again. Even if we take that at face value, your purported solution does nothing to resolve it. The following code compiles: let mut x = 42;
let mut r = &mut x;
{
// <pointer stuff>
let _ = r; // This does not drop r! This is a no-op!
}
*r = 0; // r is still live! But because you wrote this: // Move `valid` out of the surrounding scope and immediately drop
// it. `ptr` is now the only pointer to `valid`'s referent.
let _ = valid; I just nodded along, believing the comment, and didn't even catch that it doesn't drop |
@jswrenn And it gets worse. Even if you had correctly turned let mut arr = [1, 2, 3];
let r = &mut arr[..];
let p = r.as_mut_ptr();
drop(r);
unsafe { dbg!(&*p) };
I kindly ask you to reflect on why the refactor, despite what I assume is your best intent and care with respect to the safety comments, did not catch either of the above issues nor that their intended combination still would be invalid. |
Whatever the reason, I'm sure it's not because I wrote too many comments. I appreciate your knowledgeability, but, again, this abstraction has to survive idiots like me and whoever else contributes to itertools in the future. Its soundness cannot depend on the brilliance of a lone genius. I'm delighted to take your suggestions, but I am going to make the reasoning behind those suggestions explicit with new comments. |
Stripping most of the comments out for your benefit, could you review this implementation (essentially undoing this): fn drop(&mut self) {
// Select the valid elements of `self.arr`.
let (valid, _) = self.arr.split_at_mut(self.len);
// Cast `valid` from `&[MaybeUninit<T>]` to `&[T]`
let (_, valid, _): (_, &mut [T], _) = unsafe { valid.align_to_mut::<T>() };
// Drop `valid`.
unsafe {
ptr::drop_in_place(valid);
}
} |
@jswrenn While I believe that is sound, I find using As an alternative, I would propose that we copy unsafe fn maybeuninit_slice_assume_init_mut<T>(slice: &mut [MaybeUninit<T>]) -> &mut [T] {
// SAFETY: see the standard library implementation of MaybeUninit::slice_assume_init_mut,
// this implementation is copied from there as it is not available at our MSRV.
unsafe { &mut *(slice as *mut [MaybeUninit<T>] as *mut [T]) }
} Furthermore I also don't understand why you use impl<T, const N: usize> AsMut<[T]> for ArrayBuilder<T, N> {
fn as_mut(&mut self) -> &mut [T] {
// SAFETY: self.arr[..self.len] is valid.
unsafe { maybeuninit_slice_assume_init_mut(&mut self.arr[..self.len]) }
}
} Then our impl<T, const N: usize> Drop for ArrayBuilder<T, N> {
fn drop(&mut self) {
unsafe { core::ptr::drop_in_place(self.as_mut()) }
}
} EDIT: instead of |
I quite like that formulation! |
@jswrenn About CI, I just noticed 😮 that the |
9911308
to
cf0a160
Compare
// SAFETY: The safety invariant of `self.arr` applies to elements at | ||
// indices `0..self.len` — not to the element at `self.len`. Writing to | ||
// the element at index `self.len` therefore does not violate the safety | ||
// invariant of `self.arr`. Even if this line panics, we have not | ||
// created any intermediate invalid state. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not technically necessary to prove soundness, but it may be worth pointing out in this comment that it's not unsound to overwrite an initialized value - it would just have the effect of mem::forget
'ing that value and replacing it with a new one. It's obviously good for correctness that that doesn't happen, but it's not technically a soundness requirement.
/// The elements of `arr[..len]` are valid `T`s. | ||
arr: [MaybeUninit<T>; N], | ||
|
||
/// The number of leading elements of `arr` that are valid `T`s, len <= N. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
len <= N
should probably be considered a safety invariant. It may technically not be used as a safety invariant as written (e.g., it might be the case that all accesses of arr[len]
are bounds checked, and so they would just panic if len
is out of bounds), but that's very subtle. It'd probably be simpler to just treat len <= N
as a safety invariant.
e6b72df
to
0cfa849
Compare
0cfa849
to
cd51427
Compare
Co-authored-by: Jack Wrenn <jack@wrenn.fyi> Co-authored-by: Joshua Liebow-Feeser <joshlf@users.noreply.github.com>
cd51427
to
acf9436
Compare
With this pull request I add two new functions to the
Itertools
trait:These behave exactly like
next_tuple
andcollect_tuple
, however they return arrays instead. Since these functions requiremin_const_generics
, I added a tiny build script that checks if Rust's version is 1.51 or higher, and if yes to set thehas_min_const_generics
config variable. This means thatItertools
does not suddenly require 1.51 or higher, only these two functions do.In order to facilitate this I did have to bump the minimum required Rust version to 1.34 from the (documented) 1.32, since Rust 1.32 and 1.33 have trouble parsing the file even if stuff is conditionally compiled. However, this should not result in any (new) breakage, because
Itertools
actually already requires Rust 1.34 for 9+ months, since 83c0f04 usessaturating_pow
which wasn't stabilized until 1.34.As for rationale, I think these functions are useful, especially for pattern matching and parsing. I don't think there's a high probability they get added to the standard library either, so that's why I directly make a pull request here. When/if
TryFromIterator
stabilizes we can simplify the implementation, but even then I believe these functions remain a good addition similarly howcollect_vec
is nice to have despite.collect::<Vec<_>>
existing.