-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tracking issue for Write::write_all_vectored
#70436
Comments
The main API question I have (which is more about IoSlice::advance than this I guess) is if we're okay saying that the content of the |
@sfackler I agree, but I don't know how an alternative that takes |
It could for example perform vectored writes when the current cursor position is at a slice boundry and switch to a normal write_all if it's in the middle of one of the slices. |
But then you're missing out on the efficiency of vectored writes, if that were the case I wouldn't consider using it. |
You are only missing out on the efficienty of vectored writes some of the time. |
True, but with the implementation I provided (which I still admit isn't great) you don't miss out. I've made the point already in rust-lang/futures-rs#1741 (comment), but personally I don't its really a big deal that the provided slice is modified. In most cases I imagine the |
One possibility would be to keep a fixed-size buffer of, say, I'm not necessarily advocating for this idea, as I think it might be fine to mutate the caller's slice (they can always make a clone if they don't want to deal with that), but it's an option. |
…albertodt Add io::Write::write_all_vectored Similar to io::Write::write_all but uses io::Write::write_vectored instead. Updates rust-lang#70436 /cc @cramertj @sfackler
…albertodt Add io::Write::write_all_vectored Similar to io::Write::write_all but uses io::Write::write_vectored instead. Updates rust-lang#70436 /cc @cramertj @sfackler
…albertodt Add io::Write::write_all_vectored Similar to io::Write::write_all but uses io::Write::write_vectored instead. Updates rust-lang#70436 /cc @cramertj @sfackler
…albertodt Add io::Write::write_all_vectored Similar to io::Write::write_all but uses io::Write::write_vectored instead. Updates rust-lang#70436 /cc @cramertj @sfackler
This comment has been minimized.
This comment has been minimized.
Write::write_all_vectored
I'm very happy to see this method added, thank you! I regularly find myself wanting this method, when writing various different pieces of data into a buffer and not wanting to copy them around excessively. |
Regarding the API: Normally, you can't split a With that in mind, why don't we implement |
@joshtriplett - yep, that approach was discussed up above. The thought for the current implementation is that the caller isn't going to care about the contents of the buffers slice after the call completes, and mutating it minimizes the number of write calls. |
If the calls are returning early, there may not be as much value in minimizing the number of calls. Also, could we restore the slice afterwards, even if we mutate it as we go? For each call to |
I don't quite understand this statement, the whole point of using vectored I/O is minimizing the number of system calls.
I guess we could but are there many cases in which you need to use the buffers after having written them all? I've personally never encountered that. We could also create an |
On Fri, Apr 17, 2020 at 01:12:55AM -0700, Thomas de Zeeuw wrote:
> If the calls are returning early, there may not be as much value in minimizing the number of calls.
I don't quite understand this statement, the whole point of using vectored I/O is minimizing the number of system calls.
Under normal circumstances, I'd expect the kernel to process as much of
a `writev` call as possible. What I'm suggesting is that if `writev`
returns early, such that you have to re-call it in order to finish
writing, then making an additional `write` call before doing so does not
seem likely to cause a problem.
If someone is trying to optimize the number of syscalls there, they
would want to start by figuring out why the kernel returned early from
`writev` in the first place.
> Also, could we restore the slice afterwards, even if we mutate it as we go? For each call to `write_vectored`, at most one change to the slice needs to occur, so we could easily change it back after that call. That way, we're requiring exclusive access to the slice but giving it back.
I guess we could but are there many cases in which you need to use the buffers after having written them all? I've personally never encountered that.
If you want to reuse the same buffers, it may also make sense to reuse
the [IoSlice].
We could also create an `IoSlices` type that wraps `&mut [IoSlice<'_>]` and take ownership of that to make the ownership part of the API clearer.
That would improve on the current API. Or we could accept a `Cow`, and
only make a copy if we need to mutate it.
|
That can have many causes that change from OS to OS and even from OS version to version, I don't think its worth it to determine the cause for each. But we should still keep the goal of minimising the number of system calls, using
The underlying buffers can be reused, so all you'll need to do is
Maybe, but what would the |
On Tue, Apr 21, 2020 at 02:22:57AM -0700, Thomas de Zeeuw wrote:
> If you want to reuse the same buffers, it may also make sense to reuse the [IoSlice].
The underlying buffers can be reused, so all you'll need to do is `IoSlice::new` which is effectively a copy of the pointer and length. I don't think that is too much overhead.
Consider the case where you have a long [IoSlice] because you're
stitching together a buffer from many disparate pieces.
> That would improve on the current API. Or we could accept a `Cow`, and only make a copy if we need to mutate it.
Maybe, but what would the `ToOwned` impl for `IoSlice` be? For a slice its converted into a `Vec`, which allocates, something we don't want for this API.
Good point. Adding IoSlices to represent an owned [IoSlice] seems like a
good step, then, if we decide to consume the IoSlice.
|
@Thomasdezeeuw is it possible to also implement a similar feature for reading? Similar to this:pub trait Read {
// ...
fn read_exact_vectored(&mut self, mut bufs: &mut [IoSliceMut]) -> std::io::Result<()> {
// Guarantee that bufs is empty if it contains no data,
// to avoid calling write_vectored if there is no data to be written.
bufs = IoSliceMut::advance(bufs, 0);
while !bufs.is_empty() {
match self.read_vectored(bufs) {
Ok(0) => {
return Err(std::io::Error::new(std::io::ErrorKind::UnexpectedEof, "failed to fill whole buffer"));
}
Ok(n) => bufs = IoSliceMut::advance(bufs, n),
Err(ref e) if e.kind() == std::io::ErrorKind::Interrupted => {}
Err(e) => return Err(e),
}
}
Ok(())
}
} |
@q2p I guess so, but let's not track that in this issue. You can create a pr for it. |
It seems like the biggest blocker for this is the open question about the API. The current API requires the caller to pass a I would like to propose changing this, for a couple of reasons:
Meanwhile, requiring a writable slice of iovecs forces the caller to reconstruct their iovecs for every call, rather than reusing them for multiple calls (possibly with some modifications, but not rewriting them from scratch). This pessimizes the fast path in favor of optimizing the slower path. I would propose changing this to an immutable reference, instead. If the initial write comes back partial, This would both optimize the common case, and make all cases easier to invoke. We discussed this in the @rust-lang/libs-api meeting today. Some others felt that this was a reasonable approach to simplify the API; others didn't feel strongly either way; nobody felt that we should keep the current |
I am not saying that it is a common pattern, but in a ecological simulation I am using a single call to
Agreeing that the above situation is not common, I still wonder if with the platform-specific knowledge embedded in libstd, that temporary buffer could be made sufficiently large to "saturate" the system call interface, e.g. contain 1024 |
Note that I'm the one who added that question and after working with for a while it's not that big a deal. It's a little different then what you might expect, e.g. compare to the
I don't think that is true. If I remember correctly the limit is 1024, and I've been near it before so I don't think we should optimise of for the 2-3 iovecs use case too much. Of course all this is just anecdotal.
I have to say I disagree with this approach. Do you want to add a stack of 1024 ( I know the current API is a little odd at first, but when you assume that the buffer (slices) you passed in are not usable after the call it's an OK API to use. Futhermore as it's currently implemented it's zero overhead, while your suggestion of copying is not. If you're doing vectored I/O in the first place it's likely you care about this.
I didn't attend this meeting, nor am I part of the libs team, but I do feel strongly against this preposed case. May I ask how many of the people actually use the API? Because that might influence their vote. (hope I don't come off too strong with the last sentence, I'm just curious if people deciding on this are actually using the API) |
@Thomasdezeeuw I'm not suggesting that we optimize for the 2-3 case; I'm suggesting that we don't need to optimize heavily for an already-less-optimized subcase (partial writes) of the 1024 case, at the expense of both usability (not having to recreate buffers) and other optimization (not having to recreate buffers).
No, I personally would add a local array of 8, and if larger than that, call Also, we have helpers that'd make it easy to write a mutate-in-place
It's not zero-overhead if you count the cost of recreating a new set of iovecs for each call.
In the optimal case, your |
I agree that the usability improvement is nice, but I've found I always have to recreate my buffers anyway (see below). If it's such a big problem maybe this needs a new type that wraps
That would make this function unusable for my case, which would be a real shame.
But you'll almost always have to recreating the Could you show an example where it is possible to keep your
I agree, but it's not possible for a program to ensure the operation completes in one |
The issue isn't just that callers will forget this (
let mut iovecs = [
IoSlice::new(header),
IoSlice::new(x),
];
w.write_all_vectored(&iovecs)?;
iovecs[1] = IoSlice::new(y);
w.write_all_vectored(&iovecs)?; Also, there are several functions we could add to make this better. For instance, we could add a function from FWIW, I could also work with a |
To be fair that's why the documentation says
If you program with that in mind, essentially passing ownership of
What I meant what do you have an example where you actually modify the underlying buffers. If you have the headers prepared (especially if it's just a single buffer) I don't think the overhead of recreating it is too much.
That could, especially if the restoring can be optimised away if the compiler can see the buffers aren't used any more. 👍 |
For what it's worth, I was planning to use it for the case of assembling a text document from potentially thousands of pieces. For my use case, making a big Accepting a You know, now that I think about it, I'm totally free to just implement that function myself in terms of |
How about an owned type that wraps |
@Thomasdezeeuw wrote:
That doesn't address this concern I wrote above:
What if we used a Also, we could avoid an extra syscall even in the borrowed case, if we're willing to allocate in that case, and Cow will take care of that for us. |
@joshtriplett I'm sorry, but I don't really have the energy for this issue any more... You're comment completely misses the point of vectored I/O. If people are looking at using vectored I/O it's likely they're looking at it for performance reasons, suggesting to just allocate a new buffer so it can be used once in a system call is really quite far off this target. It's been more than two years, the API works and has zero overhead as-is, but yes it has a gotcha. At this point either stabilise it or remove it, I'm no longer interested, this has just taken too long. |
So, I do care very much about vectored I/O as well, and I care about the use case of being able to reuse the iovecs repeatedly, which is also something people do in C for reasons of both performance and simplicity. I'd like to be able to do that somehow in Rust, and not have to throw away and recreate the array of iovecs every time. I'm trying to brainstorm some data structure that will allow for that reuse. I was trying to favor using existing constructs and types (at the expense of incurring overhead in what I saw as the less common case of partial write), but I acknowledge that that's not ideal. I agree that Cow wasn't a great idea. As an alternative closer to what you already proposed, perhaps it makes more sense to make a new Rough sketch of a solution that shouldn't need to make any copies at all:
If you have an That should avoid any copying, avoid any more syscalls than absolutely necessary, and overall make this exactly as efficient and capable as Does something like that sound reasonable? |
I wrote up a quick implementation of something similar to the proposed design. It's a pretty straightforward elaboration of struct BufferManager<'a> {
buffers: &'a mut [io::IoSlice<'a>],
saved: Option<io::IoSlice<'a>>,
}
impl BufferManager<'a> {
pub fn new(buffers: &'a mut [io::IoSlice<'a>]) -> Self {
// strip empty buffers
let first_non_empty = buffers
.iter()
.position(|buf| buf.len() > 0)
.unwrap_or(buffers.len());
let buffers = &mut buffers[first_non_empty..];
Self {
buffers,
saved: None,
}
}
pub fn get(&self) -> &[io::IoSlice<'a>] {
self.buffers
}
pub fn count(&self) -> usize {
self.buffers.len()
}
pub fn advance(&mut self, amount: usize) {
while let Some((head, tail)) = self.buffers.split_first_mut() {
// The head is smaller than the overall write, so pop it off the
// the front of the buffers. Be sure to restore the original state,
// if necessary.
if let Some(remaining) = amount.checked_sub(head.len()) {
amount = remaining;
if let Some(saved) = self.saved.take() {
*head = saved;
}
*buffers = tail;
}
// The head is larger than the overall write, so it needs to be
// modified in place. Be sure to save the current state of the
// buffer, if necessary.
else {
if amount > 0 {
self.saved = Some(self.saved.unwrap_or(*head));
*head = io::IoSlice::new(&head[amount..]);
}
return;
}
}
assert!(amount == 0, "advanced too far")
}
}
impl Drop for BufferManager<'_> {
fn drop(&mut self) {
// When the buffer manager is dropped, restore the state of the
// current buffers head, if necessary. It shouldn't be possible for
// there to be a saved value while the buffer list is empty.
if let Some(head) = self.buffers.first_mut() {
if let Some(saved) = self.saved {
*head = saved
}
}
}
} Then, fn write_all_vectored(
&mut self,
buffers: &mut [io::IoSlice<'_>],
) -> io::Result<()> {
let mut buffers = BufferManager::new(buffers);
while buffers.count() > 0 {
match dest.write_vectored(buffers.get()) {
Ok(0) => return Err(io::ErrorKind::WriteZero.into()),
Ok(n) => buffers.advance(n),
Err(err) if err.kind() == io::ErrorKind::Interrupted => {}
Err(err) => return Err(err),
}
}
Ok(())
} The only downside here is that it does require the user to provide an |
Can an extra method, or even option or something, be added to allow retrying the operation with data that wasn't successfully written? I'd like to tolerate some inner write errors like |
We don't do this for the |
Would you consider an alternate signature that returns the total number of bytes written? fn write_all_vectored(&mut self, bufs: &mut [IoSlice<'_>]) -> Result<usize> I am aware that it is not consistent with the signature of dest.write_all(&buf)?;
Ok(buf.len) Unless I am missing something crucial, callers of In contrast, In my experience (mostly embedded Linux) the returned count of bytes written is occasionally useful at runtime and extremely useful in unit tests. Since users cannot implement this efficiently without re-implementing Example: fn send_entercnd1(password: &str, dest: &mut impl std::io::Write) -> std::io::Result<usize> {
Ok(dest.write_all_vectored(&[
IoSlice::new(b"AT!ENTERCND=\""),
IoSlice::new(password.as_bytes()),
IoSlice::new(b"\"\r"),
])?)
} vs fn send_entercnd2(password: &str, dest: &mut impl std::io::Write) -> std::io::Result<usize> {
// I know two of these have constant length and it could just return Ok(15+password.len())
// In general this won't be the case. I didn't see the need to complicate the example.
let mut bufs = &[
IoSlice::new(b"AT!ENTERCND=\""),
IoSlice::new(password.as_bytes()),
IoSlice::new(b"\"\r"),
];
// get length before `write_all_vectored`
let length = {
let mut acc = 0;
for buf in bufs {
acc += buf.len();
}
acc
};
dest.write_all_vectored(&mut bufs)?;
Ok(length);
} |
This can trivially be done via |
We got bitten in our codebase by a bug caused by us handling Still it seems that this scenario could either be 1) more explicitly disallowed through type-checking by e.g. making |
This is a tracking issue for
io::Write::write_all_vectored
.Feature gate:
#![feature(write_all_vectored)]
.Steps:
Unresolved questions:
IoSlice
s as mutable slice and modifies them. This is a pretty unusual and potentially error-prone API. We could either find a way to not mutate the argument or to enforce (via the type system) the argument can't be used by the user afterwards. Or we can decide that such an unusual API is fine for this rather low level method.Original issue:
In the
io::Write
trait we've got the helpfulwrite_all
method, which callswrite
in a loop to write all bytes. However there is no such function forwrite_vectored
. I would suggest adding a function calledwrite_all_vectored
to performance such a task.A possible implementation. Note that
bufs
is a mutable slice, also see the discussion in /~https://github.com/rust-lang/futures-rs/pull/1741/files. On the playground: https://play.rust-lang.org/?version=nightly&mode=debug&edition=2018&gist=872e9d973bd8101e7724292f87a82869.Related: rust-lang/futures-rs#1741
/cc @cramertj
The text was updated successfully, but these errors were encountered: