-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve LLVM optimization across function calls #21465
Comments
On the other hand, #[inline(never)]
fn extend2<T, I: Iterator<Item=T>+ExactSizeIterator>(vec: &mut Vec<T>, mut iter: I) {
let len = iter.len();
vec.reserve(len);
unsafe {
let mut ptr = vec.as_mut_ptr().offset(vec.len() as isize);
let end = ptr.offset(len as isize);
while ptr != end {
let el = iter.next();
assume(el.is_some());
ptr::write(ptr, el.unwrap());
ptr = ptr.offset(1);
}
}
} So one might assume that if one can get |
CC @Aatch |
It is possible to always inline methods with I might investigate what's going on here though. I assume this is all in the same crate? |
@Aatch: Adding this attribute to the
It's all one file. |
Maybe related: #18363 |
Ok, figured out the problem. It's one I already know about, which is nice I guess. The long and short of it though is that LLVM doesn't know about ownership. To demonstrate, this change to #[inline(never)]
fn extend4<T, I: RawIterator<Item=T>>(vec: &mut Vec<T>, iter: I) {
let mut iter = iter; // Add this line
let len = iter.len();
vec.reserve(len);
unsafe {
let mut ptr = vec.as_mut_ptr().offset(vec.len() as isize);
let end = ptr.offset(len as isize);
while ptr != end {
ptr::write(ptr, iter.next());
ptr = ptr.offset(1);
}
}
} That is literally it. The problem is that LLVM doesn't realise that, in the caller, the changes won't be visible. The iterator is passed as a pointer, due to it being too big to be an immediate. Thus we're writing to memory that can be seen from outside the function. Making a new variable copies the iterator into a local stack slot and LLVM suddenly gets much happier about optimising to much faster code. The ideal fix would be to teach LLVM a bit about our ownership semantics. |
Ah, so the problem wasn't the function call but the place where the pointer was stored? Either way, with this change the #[inline(never)]
fn extend2<T, I: Iterator<Item=T>+ExactSizeIterator>(vec: &mut Vec<T>, iter: I) {
let mut iter = iter;
let len = iter.len();
vec.reserve(len);
unsafe {
let mut ptr = vec.as_mut_ptr().offset(vec.len() as isize);
let end = ptr.offset(len as isize);
while ptr != end {
let el = iter.next();
assume(el.is_some());
ptr::write(ptr, el.unwrap());
ptr = ptr.offset(1);
}
}
} |
@mahkoh Is the while |
Unfortunately #[inline(never)]
fn extend1<T, I: Iterator<Item=T>+ExactSizeIterator>(vec: &mut Vec<T>, iter: I) {
let mut iter = iter;
let len = iter.len();
vec.reserve(len);
for _ in (0..len) {
let el = iter.next();
unsafe {
assume(el.is_some());
assume(vec.len() < vec.capacity());
vec.push(el.unwrap())
}
}
} doesn't work. |
Maybe |
This is how I would "ideally" write it (ignoring panic concerns, which is I guess why you went with push?):
|
That does indeed work: #[inline(never)]
fn extend7<T, I: Iterator<Item=T>+ExactSizeIterator>(vec: &mut Vec<T>, mut iter: I) {
let mut iter = iter;
let len = iter.len();
vec.reserve(len);
unsafe {
let mut ptr = vec.as_mut_ptr().offset(vec.len() as isize);
for el in iter {
ptr::write(ptr, el);
ptr = ptr.offset(1);
}
vec.set_len(len);
}
} |
Okay that's great. Just need to add an O(1) unwind guard to set the len and we're golddeeen... maybeeee..? Ah zero-sized-types complicate this all, but it's easy enough to just have another branch to cover that. |
#[inline(never)]
fn extend7<T, I: Iterator<Item=T>+ExactSizeIterator>(vec: &mut Vec<T>, iter: I) {
struct PanicGuard<'a, T: 'a> {
vec: &'a mut Vec<T>,
ptr: *const *mut T,
}
#[unsafe_destructor]
impl<'a, T> Drop for PanicGuard<'a, T> {
fn drop(&mut self) {
unsafe {
let diff = *self.ptr as usize - self.vec.as_ptr() as usize;
let size = mem::size_of::<T>();
let len = diff / if size == 0 { 1 } else { size };
self.vec.set_len(len);
}
}
}
let mut iter = iter;
let len = iter.len();
vec.reserve(len);
unsafe {
{
let mut ptr = if mem::size_of::<T>() == 0 {
(vec.as_mut_ptr() as usize + vec.len()) as *mut T
} else {
vec.as_mut_ptr().offset(vec.len() as isize)
};
let guard = PanicGuard { vec: vec, ptr: &ptr };
for el in iter {
ptr::write(ptr, el);
ptr = if mem::size_of::<T>() == 0 {
(ptr as usize + 1) as *mut T
} else {
ptr.offset(1)
};
}
mem::forget(guard);
}
vec.set_len(len);
}
} This works but only for primitive types. |
LLVM will not optimize this for structs even if the struct contains a single primitive field. |
Vec<T>
has anextend
function which is equivalent topush_all
except that it accepts an iterator.push_all
optimizes tomemcpy
butextend
does not. Consider the following unsafe Iterator:This has all safety features removed. The user is responsible for not calling
next
when there are no more elements. One can assume that the normal Iterator will never optimize better than this Iterator.Consider the following variants of
extend
:You can see that
extend5
is nothing butextend4
withnext
inlined manually. However,extend5
optimizes to a memcpy whileextend4
does not.From this we can see that adding an unsafe
ExactSizeIterator
trait will not improve theextend
performance on its own.The text was updated successfully, but these errors were encountered: