-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement Vec::from_elem
specialization for all Copy
types
#41335
Conversation
r? @BurntSushi (rust_highfive has picked a reviewer for you, use r? to override) |
src/libcollections/vec.rs
Outdated
@@ -78,7 +78,6 @@ use core::intrinsics::{arith_offset, assume}; | |||
use core::iter::{FromIterator, FusedIterator, TrustedLen}; | |||
use core::mem; | |||
#[cfg(not(test))] | |||
use core::num::Float; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is why I think attributes should be on the same line with use
/mod
/extern crate
items.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Argh, sorry! It should now be fixed.
If the input element is zero, `Vec::from_elem` can just invoke `calloc` for any `Copy` type. If the input is non-zero, but its size is 1, it can allocate and then `memset` the buffer.
Thanks for the PR @ranma42! @BurntSushi or some other reviewer will be looking at your PR shortly. |
_ => 0u8 == chunked_or(x), | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you write some comments explaining how chunked_or
and is_zero
work? Are there any pitfalls? What happens when the alignment of a type is bigger than 16
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added some comments that should answer your questions, with the exception of "are there any pitfalls?".
There should be no correctness issues, but there is certainly a performance tradeoff: we are now checking the contents of the input and we pay for that.
We are making vec![None; 1024*1024]
much faster, but vec![Some(3); 1]
is going to be a little slower (if the compiler can see the literal value, the checks are going to be optimized away, but in general that might not be possible).
cc @rust-lang/libs |
@aturon I haven't used specialization yet, so some extra eyes on that piece would be great. |
Add an explanation of the safety requirements for `chunked_or` and how `is_zero` wraps it exposing a safe interface.
I personally feel like this is too much magic and may be an example of specialization going too far. This may also not work for composite types due to padding. Is there a use case motivating this today? Ideally I'd prefer to just wait for specialization to stabilize in which case we can just mark the implementation in libstd as overridable, so downstream crates can customize the implementation. |
@alexcrichton Could you explain a little more what is the issue with padding? AFAICT in the worst case the input object might have some padding initialized to a non-zero value, in which case it would not benefit from From what I understood about coherence rules and specialization, it would be impossible to customize the implementation for several types for which this would be useful, such as |
Do we have evidence of this making a performance difference? |
@ranma42 ah yes sorry, I mean what you're thinking. A "zero" type like |
@alexcrichton yes, that can be a problem, but on the bright side it would not affect the correctness of the program: the "issue" would be that in some conditions the program run faster than you expect, instead of always being slow. Moreover, the same holds true for most optimisations (especially when combined). In some specific contexts (avoiding timing attacks in cryptography, meeting guaranteed deadlines in realtime) predictable timings are more desirable than fast programs, but I believe this is not the general case nor something that LLVM has strong support for. @aturon I rerun the benchmark from #40409 (comment) adapted as follows: #![feature(test)]
extern crate test;
use test::Bencher;
#[derive(Clone)]
struct Clonable<T>(T);
#[bench]
fn bench_data(b: &mut Bencher) {
b.iter(|| vec![0u8; 1024 * 1024]);
}
#[bench]
fn bench_clone(b: &mut Bencher) {
b.iter(|| vec![Clonable(0u8); 1024 * 1024]);
}
#[bench]
fn bench_copy(b: &mut Bencher) {
b.iter(|| vec![(0u8,); 1024 * 1024]);
}
#[bench]
fn bench_nullptr(b: &mut Bencher) {
b.iter(|| vec![std::ptr::null::<u8>(); 1024 * 1024]);
}
#[bench]
fn bench_none_ref(b: &mut Bencher) {
b.iter(|| vec![None::<&u8>; 1024 * 1024]);
}
#[bench]
fn bench_padded(b: &mut Bencher) {
b.iter(|| vec![(0u8,0u16); 1024 * 1024]);
} shows these results: Before:
After
|
Doesn't reading padding bytes produce a undefined value? |
@sfackler yes, that is what @alexcrichton mentioned: reading padding bytes can result in an |
Is that the extent of it? Are you sure it's not going to poison other future computation? |
If padding bytes are merely However, The same proposal adds a And of course, all this is only as far as LLVM is concerned, the unsafe code guidelines probably have or will have an opinion on reading padding bytes, too. tl;dr Reading padding bytes is a minefield. |
@rkruppe so to clarify, you think that this PR leads to undefined behavior? If so that seems quite bad! |
@alexcrichton I don't think it currently leads to UB as far as LLVM is concerned. However,
|
Hm ok. I could definitely see how LLVM could think it's reading UB b/c you're going through and doing u8 loads where there may be padding in the middle, meaning that you're definitely at some point capable of reading undefined memory and could start propagating In light of that I would not be in favor of merging this, but @ranma42 is there perhaps an alternative, safer, method to achieving the same results? |
@rkruppe I think the paper proposes a nice direction to improve the consistency of handling @alexcrichton yes, as per #41335 (comment), that is expected. My understanding of LLVM semantics is that when a jump depends on an The safest route is certainly to close without merging. It is a pity that the handling of Part of the results of this PR could be achieved by explicitly implementing the trait for |
Yes, but we most likely wouldn't want to do it for padding, at least not everywhere. It bloats IR, possibly inhibits optimizations, and is entirely pointless for all code except code that tries to be too smart for its own good about padding bytes. |
If the input element is zero,
Vec::from_elem
can just invokecalloc
for anyCopy
type.If the input is non-zero, but its size is 1, it can allocate and then
memset
the buffer.