Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Iterator::array_chunks (take N+1) #100026

Merged
merged 12 commits into from
Aug 14, 2022
Merged
180 changes: 180 additions & 0 deletions library/core/src/iter/adapters/array_chunks.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,180 @@
use crate::array;
use crate::iter::{FusedIterator, Iterator};
use crate::ops::{ControlFlow, NeverShortCircuit, Try};

/// An iterator over `N` elements of the iterator at a time.
///
/// The chunks do not overlap. If `N` does not divide the length of the
/// iterator, then the last up to `N-1` elements will be omitted.
///
/// This `struct` is created by the [`array_chunks`][Iterator::array_chunks]
/// method on [`Iterator`]. See its documentation for more.
#[derive(Debug, Clone)]
#[must_use = "iterators are lazy and do nothing unless consumed"]
#[unstable(feature = "iter_array_chunks", reason = "recently added", issue = "none")]
pub struct ArrayChunks<I: Iterator, const N: usize> {
iter: I,
remainder: Option<array::IntoIter<I::Item, N>>,
}

impl<I, const N: usize> ArrayChunks<I, N>
where
I: Iterator,
{
#[track_caller]
pub(in crate::iter) fn new(iter: I) -> Self {
assert!(N != 0, "chunk size must be non-zero");
Self { iter, remainder: None }
}

/// Returns an iterator over the remaining elements of the original iterator
/// that are not going to be returned by this iterator. The returned
/// iterator will yield at most `N-1` elements.
#[unstable(feature = "iter_array_chunks", reason = "recently added", issue = "none")]
#[inline]
pub fn into_remainder(self) -> Option<array::IntoIter<I::Item, N>> {
self.remainder
}
}

#[unstable(feature = "iter_array_chunks", reason = "recently added", issue = "none")]
impl<I, const N: usize> Iterator for ArrayChunks<I, N>
where
I: Iterator,
{
type Item = [I::Item; N];

#[inline]
fn next(&mut self) -> Option<Self::Item> {
self.try_for_each(ControlFlow::Break).break_value()
}

#[inline]
fn size_hint(&self) -> (usize, Option<usize>) {
let (lower, upper) = self.iter.size_hint();

(lower / N, upper.map(|n| n / N))
}

#[inline]
fn count(self) -> usize {
self.iter.count() / N
}

fn try_fold<B, F, R>(&mut self, init: B, mut f: F) -> R
where
Self: Sized,
F: FnMut(B, Self::Item) -> R,
R: Try<Output = B>,
{
let mut acc = init;
loop {
match self.iter.next_chunk() {
Ok(chunk) => acc = f(acc, chunk)?,
Err(remainder) => {
// Make sure to not override `self.remainder` with an empty array
// when `next` is called after `ArrayChunks` exhaustion.
self.remainder.get_or_insert(remainder);

break try { acc };
}
}
}
}

fn fold<B, F>(mut self, init: B, mut f: F) -> B
where
Self: Sized,
F: FnMut(B, Self::Item) -> B,
{
self.try_fold(init, |acc, x| NeverShortCircuit(f(acc, x))).0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid making a closure type generic over I too:

Suggested change
self.try_fold(init, |acc, x| NeverShortCircuit(f(acc, x))).0
self.try_fold(init, NeverShortCircuit::wrap_mut_2(f)).0

(and the same in rfold)

}
}

#[unstable(feature = "iter_array_chunks", reason = "recently added", issue = "none")]
impl<I, const N: usize> DoubleEndedIterator for ArrayChunks<I, N>
where
I: DoubleEndedIterator + ExactSizeIterator,
{
#[inline]
fn next_back(&mut self) -> Option<Self::Item> {
self.try_rfold((), |(), x| ControlFlow::Break(x)).break_value()
}

fn try_rfold<B, F, R>(&mut self, init: B, mut f: F) -> R
where
Self: Sized,
F: FnMut(B, Self::Item) -> R,
R: Try<Output = B>,
{
// We are iterating from the back we need to first handle the remainder.
self.next_back_remainder();

let mut acc = init;
let mut iter = self.iter.by_ref().rev();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: You have a Sized iterator here (since it's a field), so you can avoid some of the by_ref optimization penalty by using /~https://github.com/rust-lang/rust/blob/master/library/core/src/iter/adapters/by_ref_sized.rs instead of by_ref().


// NB remainder is handled by `next_back_remainder`, so
// `next_chunk` can't return `Err` with non-empty remainder
// (assuming correct `I as ExactSizeIterator` impl).
while let Ok(mut chunk) = iter.next_chunk() {
chunk.reverse();
acc = f(acc, chunk)?
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

by_ref AND a double-reverse? 😬 that must be horrendously slow.

Imo we either need a next_chunk_back or only limit it to forward iteration. The former could be done in a separate PR, but we need to decide on an approach.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry you had to see this impl 😅


try { acc }
}

fn rfold<B, F>(mut self, init: B, mut f: F) -> B
where
Self: Sized,
F: FnMut(B, Self::Item) -> B,
{
self.try_rfold(init, |acc, x| NeverShortCircuit(f(acc, x))).0
}
}

impl<I, const N: usize> ArrayChunks<I, N>
where
I: DoubleEndedIterator + ExactSizeIterator,
{
/// Updates `self.remainder` such that `self.iter.len` is divisible by `N`.
fn next_back_remainder(&mut self) {
// Make sure to not override `self.remainder` with an empty array
// when `next_back` is called after `ArrayChunks` exhaustion.
if self.remainder.is_some() {
return;
}

// We use the `ExactSizeIterator` implementation of the underlying
// iterator to know how many remaining elements there are.
let rem = self.iter.len() % N;

// Take the last `rem` elements out of `self.iter`.
let mut remainder =
// SAFETY: `unwrap_err` always succeeds because x % N < N for all x.
unsafe { self.iter.by_ref().rev().take(rem).next_chunk().unwrap_err_unchecked() };
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I'll block the PR on this, but unwrap_err_unchecked here scares me -- I added a note to the tracking issue.

My instinct for now is that %N is definitely trustworthy, so that combined with take is sufficient to say this is ok. But I'm still afraid, since it's not TrustedLen, and wonder if someone can come up with an evil example here where it somehow gets a full chunk and thus is UB in safe code.

(At least the obvious things, like a len() that always says usize, is protected against because of the modulo.)

Copy link
Member

@the8472 the8472 Aug 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As already noted in another comment I don't like this entire method. Either we should add next_chunk_back or remove the DEI impl.

Copy link
Member Author

@WaffleLapkin WaffleLapkin Aug 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, an evil impl can return less than rem elements, but actual <= rem < N still holds (I don't think there is any way to trick Take into returning more than rem elements) so the chunk will be an error no matter what.

An evil impl can just return a wrong length, so after this iterator returns a number of elements not divisible by N. But this is fine too, an Err will just be ignored:

// NB remainder is handled by `next_back_remainder`, so
// `next_chunk` can't return `Err` with non-empty remainder
// (assuming correct `I as ExactSizeIterator` impl).
while let Ok(mut chunk) = iter.next_chunk() {

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That being said, as @the8472 already said, we should probably rewrite the DEI impl anyway (I'll try to do this, after this lands).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the % is certainly fine. What I'm not yet convinced of is that there's no way to make take misbehave somehow -- for example, Take::try_fold uses the inner try_fold, so maybe there'd be a way for that to be implemented wrongly-but-not-UB that could result in too many things getting put in the array somehow.

But I agree that just making a nicer implementation without the unsafe{} is the right plan.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually you are right 😨

Overriding try_fold with some nasty unsafe to skip over Break(_) allows you to trick Take into returning more elements that it should (playground).

n can underflow, if a bad iterator impl skips over Break(_):

*n -= 1;
let r = fold(acc, x);
if *n == 0 { ControlFlow::Break(r) } else { ControlFlow::from_try(r) }

The fact that unsafe code can't trust Take is scary.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the problem there is still in the unsafe in Misbehave -- it protects against double-drop, but duplicating arbitrary values can violate safety invariants in general. (For example, if the type is !Clone + !Default, I can use you moving it into me to track resource consumption, like a ZST tracker for a global semaphore.) So that specific example is just "well unsound unsafe-using code breaks everything".

But yeah, even though I've not been able to come up with a safe trick that would do it, things along those lines are what make me worried about it. (And certainly if specialization existed then it would be possible to do particularly weird things in safe code because it can violate parametricity even worse than in normal code.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The example could check, that type_id::<B>() == type_id::<usize>() or something similar, which would IMO make the code sound.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I thought about that one, but TypeId::of::<B>() needs B: 'static, which we don't have here. Regardless, it's yet another chink in the armour that makes be scared we'll break through eventually.


// We used `.rev()` above, so we need to re-reverse the reminder
remainder.as_mut_slice().reverse();
self.remainder = Some(remainder);
}
}

#[unstable(feature = "iter_array_chunks", reason = "recently added", issue = "none")]
impl<I, const N: usize> FusedIterator for ArrayChunks<I, N> where I: FusedIterator {}

#[unstable(feature = "iter_array_chunks", reason = "recently added", issue = "none")]
impl<I, const N: usize> ExactSizeIterator for ArrayChunks<I, N>
where
I: ExactSizeIterator,
{
#[inline]
fn len(&self) -> usize {
self.iter.len() / N
}

#[inline]
fn is_empty(&self) -> bool {
self.iter.len() < N
}
}
4 changes: 4 additions & 0 deletions library/core/src/iter/adapters/mod.rs
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
use crate::iter::{InPlaceIterable, Iterator};
use crate::ops::{ChangeOutputType, ControlFlow, FromResidual, NeverShortCircuit, Residual, Try};

mod array_chunks;
mod by_ref_sized;
mod chain;
mod cloned;
Expand Down Expand Up @@ -32,6 +33,9 @@ pub use self::{
scan::Scan, skip::Skip, skip_while::SkipWhile, take::Take, take_while::TakeWhile, zip::Zip,
};

#[unstable(feature = "iter_array_chunks", reason = "recently added", issue = "none")]
pub use self::array_chunks::ArrayChunks;

#[unstable(feature = "std_internals", issue = "none")]
pub use self::by_ref_sized::ByRefSized;

Expand Down
2 changes: 2 additions & 0 deletions library/core/src/iter/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -398,6 +398,8 @@ pub use self::traits::{

#[stable(feature = "iter_zip", since = "1.59.0")]
pub use self::adapters::zip;
#[unstable(feature = "iter_array_chunks", reason = "recently added", issue = "none")]
pub use self::adapters::ArrayChunks;
#[unstable(feature = "std_internals", issue = "none")]
pub use self::adapters::ByRefSized;
#[stable(feature = "iter_cloned", since = "1.1.0")]
Expand Down
45 changes: 44 additions & 1 deletion library/core/src/iter/traits/iterator.rs
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ use crate::ops::{ChangeOutputType, ControlFlow, FromResidual, Residual, Try};
use super::super::try_process;
use super::super::ByRefSized;
use super::super::TrustedRandomAccessNoCoerce;
use super::super::{Chain, Cloned, Copied, Cycle, Enumerate, Filter, FilterMap, Fuse};
use super::super::{ArrayChunks, Chain, Cloned, Copied, Cycle, Enumerate, Filter, FilterMap, Fuse};
use super::super::{FlatMap, Flatten};
use super::super::{FromIterator, Intersperse, IntersperseWith, Product, Sum, Zip};
use super::super::{
Expand Down Expand Up @@ -3316,6 +3316,49 @@ pub trait Iterator {
Cycle::new(self)
}

/// Returns an iterator over `N` elements of the iterator at a time.
///
/// The chunks do not overlap. If `N` does not divide the length of the
/// iterator, then the last up to `N-1` elements will be omitted and can be
/// retrieved from the [`.into_remainder()`][ArrayChunks::into_remainder]
/// function of the iterator.
///
/// # Panics
///
/// Panics if `N` is 0.
scottmcm marked this conversation as resolved.
Show resolved Hide resolved
///
/// # Examples
///
/// Basic usage:
///
/// ```
/// #![feature(iter_array_chunks)]
///
/// let mut iter = "lorem".chars().array_chunks();
/// assert_eq!(iter.next(), Some(['l', 'o']));
/// assert_eq!(iter.next(), Some(['r', 'e']));
/// assert_eq!(iter.next(), None);
/// assert_eq!(iter.into_remainder().unwrap().as_slice(), &['m']);
/// ```
///
/// ```
/// #![feature(iter_array_chunks)]
///
/// let data = [1, 1, 2, -2, 6, 0, 3, 1];
/// // ^-----^ ^------^
/// for [x, y, z] in data.iter().array_chunks() {
/// assert_eq!(x + y + z, 4);
/// }
/// ```
#[track_caller]
#[unstable(feature = "iter_array_chunks", reason = "recently added", issue = "none")]
fn array_chunks<const N: usize>(self) -> ArrayChunks<Self, N>
where
Self: Sized,
{
ArrayChunks::new(self)
}

/// Sums the elements of an iterator.
///
/// Takes each element, adds them together, and returns the result.
Expand Down
Loading