-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use readv for the RX path of the network device to avoid one memory copy per frame v2 #4844
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #4844 +/- ##
==========================================
+ Coverage 83.96% 84.01% +0.05%
==========================================
Files 250 251 +1
Lines 27756 28019 +263
==========================================
+ Hits 23304 23541 +237
- Misses 4452 4478 +26
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
c6d50db
to
35ae6a5
Compare
bchalios
requested changes
Oct 11, 2024
Add a ring buffer type that is tailored for holding `struct iovec` objects that point to guest memory for IO. The `struct iovec` objects represent the memory that the guest passed to us as `Descriptors` in a VirtIO queue for performing some I/O operation. We plan to use this type to describe the guest memory we have available for doing network RX. This should facilitate us in optimizing the reception of data from the TAP device using `readv`, thus avoiding a memory copy. Co-authored-by: Babis Chalios <bchalios@amazon.es> Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
052b069
to
e7f1eca
Compare
bchalios
reviewed
Oct 11, 2024
Allow IoVecBufferMut objects to store multiple DescriptorChain objects, so that we can describe guest memory meant to be used for receiving data (for example memory used for network RX) as a single (sparse) memory region. This will allow us to always keep track all the available memory we have for performing RX and use `readv` for copying memory from the TAP device inside guest memory avoiding the extra copy. In the future, it will also facilitate the implementation of mergeable buffers for the RX path of the network device. Co-authored-by: Babis Chalios <bchalios@amazon.es> Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
Right now, we are performing two copies for writing a frame from the TAP device into guest memory. We first read the frame in an array held by the Net device and then copy that array in a DescriptorChain. In order to avoid the double copy use the readv system call to read directly from the TAP device into the buffers described by DescriptorChain. The main challenge with this is that DescriptorChain objects describe memory that is at least 65562 bytes long when guest TSO4, TSO6 or UFO are enabled or 1526 otherwise and parsing the chain includes overhead which we pay even if the frame we are receiving is much smaller than these sizes. PR firecracker-microvm#4748 reduced the overheads involved with parsing DescriptorChain objects. To further avoid this overhead, move the parsing of DescriptorChain objects out of the hot path of process_rx() where we are actually receiving a frame into process_rx_queue_event() where we get the notification that the guest added new buffers for network RX. Co-authored-by: Babis Chalios <bchalios@amazon.es> Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
Now, that we pre-process the buffers that guest provides for performing RX, we need to save them in the VM state snapshot file, for networking to work correctly post snapshot resume. Implement Persist for RxBuffers and and plug them in the (de)serialization logic of the network device. Co-authored-by: Babis Chalios <bchalios@amazon.es> Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
IoVecBufferMut type now uses IovDeque as its backing memory. IovDeque is performing a custom memory allocation, using memfd_create() and a combination of mmap() calls in order to provide a memory layout where the iovec objects stored in the IovDeque will always be in consecutive memory. kani doesn't really get along with these system calls, which breaks our proof for IoVecBufferMut::write_volatile_at. Substitute memory allocation and deallocation with plain calls to std::alloc::(de)alloc when we run kani proofs. Also provide a stub for IovDeque::push_back to provide the same memory layout invariants. Co-authored-by: Babis Chalios <bchalios@amazon.es> Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
e7f1eca
to
574301f
Compare
bchalios
approved these changes
Oct 11, 2024
roypat
approved these changes
Oct 14, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
Status: Awaiting review
Indicates that a pull request is ready to be reviewed
Type: Enhancement
Indicates new feature requests
Type: Performance
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Changes
This PR is a slightly modified version of #4799 with couple fixes:
pop_front
when the descriptor chain is invalid. Instead callpop_back
.Main changes and reasons are same.
Reason
Same as previous version of this PR.
License Acceptance
By submitting this pull request, I confirm that my contribution is made under
the terms of the Apache 2.0 license. For more information on following Developer
Certificate of Origin and signing off your commits, please check
CONTRIBUTING.md
.PR Checklist
PR.
CHANGELOG.md
.TODO
s link to an issue.contribution quality standards.
rust-vmm
.