-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Read::size_hint and pre-allocate in read_to_end #45928
Conversation
If this adds a new system call when calling read_to_end on a File, it has been discussed before. Putting a system call in size_hint (metadata) is not ideal. (I can't find it though.) |
Yes, |
src/libstd/fs.rs
Outdated
@@ -449,6 +449,13 @@ impl Read for File { | |||
self.inner.read(buf) | |||
} | |||
|
|||
fn size_hint(&self) -> usize { | |||
match self.metadata() { | |||
Ok(meta) => meta.len() as usize, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a minor point but this should probably use a saturating cast not as
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there an API for doing saturating integer casts? Writing #[cfg(target_pointer_size = …)]
code here seems tedious.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sadly not yet (rust-lang/rfcs#1218). Something like cmp::min(meta.len(), usize::max_value() as u64) as usize
should work though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the other hand this is a hint, the default impl returns zero without necessarily being at EOF so underestimating should be OK. And if you’re reading a file larger than your address space, you’re gonna have a hard time in read_to_end
anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This also needs to check its current position and subtract that off, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. However the obvious fix doesn’t work because <File as Seek>::seek
takes &mut self
, even with SeekFrom::Current(0)
. I wonder if there should be some API like File::position(&self) -> io::Result<u64>
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Never mind, <std::fs::File as Seek>::seek
calls std::sys::fs::File::seek
which takes &self
, so size_hint
can call the latter directly.
Update: This was fixed by #46050. |
random-assigning to... r? @sfackler |
Does #46050 give the same improvement? |
A non-obvious detail of If you're calling metadata(), when it provides a size, can you trust the size (and not take it as a hint)? If so, in that case you could add alternate logic in |
You can't trust the size from metadata because the file may change between the metadata call and the read call. |
You always have to be prepared to get fewer bytes than expected, or errors, but if there are more bytes than expected, it's no different from someone appending bytes after your last read returned 0. You'll miss the newly appended bytes, but it was a race anyway. That assumes that |
☔ The latest upstream changes (presumably #46166) made this pull request unmergeable. Please resolve the merge conflicts. |
What's the status of this @SimonSapin ? I'm having a hard time telling :) |
c2f42cc
to
dd1fc4a
Compare
dd1fc4a
to
6adc9e6
Compare
I’ve run the benchmark again in rustc 1.24.0-nightly (560a5da 2017-11-27), which includes #46050. Previous results were “3% more time to 43% less time” Now the time with So this PR’s improvement seems independent of #46050’s improvement. More important IMO is whether
|
Reserving This can be fixed either by reserving |
This one byte of extra capacity is necessary for the final zero-size read that indicates EOF.
Good point. I’ve added the |
To be sure, |
Cool - something like this seems worthwhile then. Should it return an |
@sfackler Good point. Changed to |
I don’t have a strong opinion on the strategy of this PR v.s. that of #46340. A key difference is the semantics of |
☔ The latest upstream changes (presumably #46485) made this pull request unmergeable. Please resolve the merge conflicts. |
Sorry for the delay here - should we maybe make an issue to figure out the approach to take? I'd like to avoid splitting the discussion across two PRs. |
I'm going to make an executive decision and mark this and #46340 as waiting on a team decision as it seems that they cannot both exist together. Let me know if you disagree. |
The @rust-lang/libs team talked about this and #46340 during our triage meeting and decided that we're not looking to expand the surface area of |
Pre-allocate in fs::read and fs::read_string This is a simpler alternative to rust-lang#46340 and rust-lang#45928, as requested by the libs team.
Pre-allocate in fs::read and fs::read_string This is a simpler alternative to rust-lang#46340 and rust-lang#45928, as requested by the libs team.
Pre-allocate in fs::read and fs::read_string This is a simpler alternative to rust-lang#46340 and rust-lang#45928, as requested by the libs team.
Many
FromIterator
implementations rely onIterator::size_hint
to pre-allocate memory. This is a prototype of what something equivalent forstd::io::Read
could look like. An alternative would be to not add any public API but use an private specialization trait, likeZipImpl
in libcore.This came up in #45837 in the case of reading from a file. I’ve measured
File::read_to_end
with a vector created withVec::with_capacity(file.metadata().len() as usize)
, compared toVec::new()
. On my linux desktop with an SSD (though everything is probably in filesystem cache here), pre-allocating + reading take 3% more time to 43% less time depending on file size.