-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Let io::copy reuse BufWriter buffers #78641
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,4 @@ | ||
use crate::io::{self, ErrorKind, Read, Write}; | ||
use super::{BufWriter, ErrorKind, Read, Result, Write, DEFAULT_BUF_SIZE}; | ||
use crate::mem::MaybeUninit; | ||
|
||
/// Copies the entire contents of a reader into a writer. | ||
|
@@ -40,7 +40,7 @@ use crate::mem::MaybeUninit; | |
/// } | ||
/// ``` | ||
#[stable(feature = "rust1", since = "1.0.0")] | ||
pub fn copy<R: ?Sized, W: ?Sized>(reader: &mut R, writer: &mut W) -> io::Result<u64> | ||
pub fn copy<R: ?Sized, W: ?Sized>(reader: &mut R, writer: &mut W) -> Result<u64> | ||
where | ||
R: Read, | ||
W: Write, | ||
|
@@ -54,14 +54,82 @@ where | |
} | ||
} | ||
|
||
/// The general read-write-loop implementation of | ||
/// `io::copy` that is used when specializations are not available or not applicable. | ||
pub(crate) fn generic_copy<R: ?Sized, W: ?Sized>(reader: &mut R, writer: &mut W) -> io::Result<u64> | ||
/// The userspace read-write-loop implementation of `io::copy` that is used when | ||
/// OS-specific specializations for copy offloading are not available or not applicable. | ||
pub(crate) fn generic_copy<R: ?Sized, W: ?Sized>(reader: &mut R, writer: &mut W) -> Result<u64> | ||
where | ||
R: Read, | ||
W: Write, | ||
{ | ||
let mut buf = MaybeUninit::<[u8; super::DEFAULT_BUF_SIZE]>::uninit(); | ||
BufferedCopySpec::copy_to(reader, writer) | ||
} | ||
|
||
/// Specialization of the read-write loop that either uses a stack buffer | ||
/// or reuses the internal buffer of a BufWriter | ||
trait BufferedCopySpec: Write { | ||
fn copy_to<R: Read + ?Sized>(reader: &mut R, writer: &mut Self) -> Result<u64>; | ||
} | ||
|
||
impl<W: Write + ?Sized> BufferedCopySpec for W { | ||
default fn copy_to<R: Read + ?Sized>(reader: &mut R, writer: &mut Self) -> Result<u64> { | ||
stack_buffer_copy(reader, writer) | ||
} | ||
} | ||
|
||
impl<I: Write> BufferedCopySpec for BufWriter<I> { | ||
fn copy_to<R: Read + ?Sized>(reader: &mut R, writer: &mut Self) -> Result<u64> { | ||
if writer.capacity() < DEFAULT_BUF_SIZE { | ||
return stack_buffer_copy(reader, writer); | ||
} | ||
|
||
// FIXME: #42788 | ||
// | ||
// - This creates a (mut) reference to a slice of | ||
// _uninitialized_ integers, which is **undefined behavior** | ||
// | ||
// - Only the standard library gets to soundly "ignore" this, | ||
// based on its privileged knowledge of unstable rustc | ||
// internals; | ||
unsafe { | ||
let spare_cap = writer.buffer_mut().spare_capacity_mut(); | ||
reader.initializer().initialize(MaybeUninit::slice_assume_init_mut(spare_cap)); | ||
} | ||
|
||
let mut len = 0; | ||
|
||
loop { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It seems like this loop doesn't update the BufWriter's internal tracking of how much data is buffered, so use after copy returns will double write bits. Rather than getting access to all of the raw bits of the writer's state, it seems like it would be cleaner to have APIs to get access to the unfilled part of the buffer, and to be able to write the buffer through (but not fully flush the inner writer) on demand. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If the reader returns an error there will be dangling data in the output buffer, and this approach requires flushing the buffer out to the inner writer regardless of how much data is in it. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. To be more specific, I'm imagining roughly the inverse of BufRead, where there is a function to return the unfilled portion of the buffer, flushing if it happens to already be full to make space, and then a function to say that more data has been written into the buffer. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I added a function that borrows the vec instead, which can be used for both. |
||
let buf = writer.buffer_mut(); | ||
let spare_cap = buf.spare_capacity_mut(); | ||
|
||
if spare_cap.len() >= DEFAULT_BUF_SIZE { | ||
match reader.read(unsafe { MaybeUninit::slice_assume_init_mut(spare_cap) }) { | ||
Ok(0) => return Ok(len), // EOF reached | ||
Ok(bytes_read) => { | ||
assert!(bytes_read <= spare_cap.len()); | ||
// Safety: The initializer contract guarantees that either it or `read` | ||
// will have initialized these bytes. And we just checked that the number | ||
// of bytes is within the buffer capacity. | ||
unsafe { buf.set_len(buf.len() + bytes_read) }; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Huh, is std being built with polonious? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not sure what is causing surprise here. The |
||
len += bytes_read as u64; | ||
// Read again if the buffer still has enough capacity, as BufWriter itself would do | ||
// This will occur if the reader returns short reads | ||
continue; | ||
} | ||
Err(ref e) if e.kind() == ErrorKind::Interrupted => continue, | ||
Err(e) => return Err(e), | ||
} | ||
} | ||
|
||
writer.flush_buf()?; | ||
Comment on lines
+122
to
+123
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think putting this in an else branch will be clearer, if the previous conditions are hit all will either continue or return, it does not reach this part. |
||
} | ||
} | ||
} | ||
|
||
fn stack_buffer_copy<R: Read + ?Sized, W: Write + ?Sized>( | ||
reader: &mut R, | ||
writer: &mut W, | ||
) -> Result<u64> { | ||
let mut buf = MaybeUninit::<[u8; DEFAULT_BUF_SIZE]>::uninit(); | ||
// FIXME: #42788 | ||
// | ||
// - This creates a (mut) reference to a slice of | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems strange for there to be specialization in the function used "when specializations are not available or not applicable".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That refers to another set of specializations. I'll try to reword it.