Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use readv for the RX path of the network device to avoid one memory copy per frame v2 #4844

Merged
merged 6 commits into from
Oct 14, 2024

Conversation

ShadowCurse
Copy link
Contributor

@ShadowCurse ShadowCurse commented Oct 9, 2024

Changes

This PR is a slightly modified version of #4799 with couple fixes:

  • fix invalid call to pop_front when the descriptor chain is invalid. Instead call pop_back.
  • detect when we are about to parse more descriptors than the queue should hold and halt the processing.
    Main changes and reasons are same.

Reason

Same as previous version of this PR.

License Acceptance

By submitting this pull request, I confirm that my contribution is made under
the terms of the Apache 2.0 license. For more information on following Developer
Certificate of Origin and signing off your commits, please check
CONTRIBUTING.md.

PR Checklist

  • If a specific issue led to this PR, this PR closes the issue.
  • The description of changes is clear and encompassing.
  • Any required documentation changes (code and docs) are included in this
    PR.
  • API changes follow the Runbook for Firecracker API changes.
  • User-facing changes are mentioned in CHANGELOG.md.
  • All added/changed functionality is tested.
  • New TODOs link to an issue.
  • Commits meet
    contribution quality standards.

  • This functionality cannot be added in rust-vmm.

Copy link

codecov bot commented Oct 9, 2024

Codecov Report

Attention: Patch coverage is 92.03980% with 32 lines in your changes missing coverage. Please review.

Project coverage is 84.01%. Comparing base (d7fbf9b) to head (e42cc54).
Report is 5 commits behind head on main.

Files with missing lines Patch % Lines
src/vmm/src/devices/virtio/net/device.rs 92.75% 10 Missing ⚠️
src/vmm/src/devices/virtio/net/persist.rs 50.00% 8 Missing ⚠️
src/vmm/src/devices/virtio/iovec.rs 89.23% 7 Missing ⚠️
src/vmm/src/devices/virtio/iov_deque.rs 96.87% 5 Missing ⚠️
src/vmm/src/devices/virtio/vsock/mod.rs 0.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4844      +/-   ##
==========================================
+ Coverage   83.96%   84.01%   +0.05%     
==========================================
  Files         250      251       +1     
  Lines       27756    28019     +263     
==========================================
+ Hits        23304    23541     +237     
- Misses       4452     4478      +26     
Flag Coverage Δ
5.10-c5n.metal 84.64% <92.03%> (+0.05%) ⬆️
5.10-m5n.metal 84.62% <92.03%> (+0.05%) ⬆️
5.10-m6a.metal 83.92% <92.03%> (+0.06%) ⬆️
5.10-m6g.metal 80.62% <92.03%> (+0.10%) ⬆️
5.10-m6i.metal 84.61% <92.03%> (+0.04%) ⬆️
5.10-m7g.metal 80.62% <92.03%> (+0.10%) ⬆️
6.1-c5n.metal 84.64% <92.03%> (+0.05%) ⬆️
6.1-m5n.metal 84.62% <92.03%> (+0.04%) ⬆️
6.1-m6a.metal 83.92% <92.03%> (+0.06%) ⬆️
6.1-m6g.metal 80.61% <92.03%> (+0.09%) ⬆️
6.1-m6i.metal 84.61% <92.03%> (+0.05%) ⬆️
6.1-m7g.metal 80.61% <92.03%> (+0.09%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@ShadowCurse ShadowCurse force-pushed the bchalios_readv branch 8 times, most recently from c6d50db to 35ae6a5 Compare October 10, 2024 17:22
@ShadowCurse ShadowCurse marked this pull request as ready for review October 10, 2024 17:56
@ShadowCurse ShadowCurse self-assigned this Oct 11, 2024
@ShadowCurse ShadowCurse added Status: Awaiting review Indicates that a pull request is ready to be reviewed Type: Enhancement Indicates new feature requests Type: Performance labels Oct 11, 2024
@ShadowCurse ShadowCurse changed the title Use readv for the RX path of the network device to avoid one memory copy per frame (attempt 2) Use readv for the RX path of the network device to avoid one memory copy per frame v2 Oct 11, 2024
Add a ring buffer type that is tailored for holding `struct iovec`
objects that point to guest memory for IO. The `struct iovec` objects
represent the memory that the guest passed to us as `Descriptors` in a
VirtIO queue for performing some I/O operation.

We plan to use this type to describe the guest memory we have available
for doing network RX. This should facilitate us in optimizing the
reception of data from the TAP device using `readv`, thus avoiding a
memory copy.

Co-authored-by: Babis Chalios <bchalios@amazon.es>
Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
@ShadowCurse ShadowCurse force-pushed the bchalios_readv branch 2 times, most recently from 052b069 to e7f1eca Compare October 11, 2024 12:44
bchalios and others added 4 commits October 11, 2024 14:00
Allow IoVecBufferMut objects to store multiple DescriptorChain objects,
so that we can describe guest memory meant to be used for receiving data
(for example memory used for network RX) as a single (sparse) memory
region.

This will allow us to always keep track all the available memory we have
for performing RX and use `readv` for copying memory from the TAP device
inside guest memory avoiding the extra copy. In the future, it will also
facilitate the implementation of mergeable buffers for the RX path of
the network device.

Co-authored-by: Babis Chalios <bchalios@amazon.es>
Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
Right now, we are performing two copies for writing a frame from the TAP
device into guest memory. We first read the frame in an array held by
the Net device and then copy that array in a DescriptorChain.

In order to avoid the double copy use the readv system call to read
directly from the TAP device into the buffers described by
DescriptorChain.

The main challenge with this is that DescriptorChain objects describe
memory that is at least 65562 bytes long when guest TSO4, TSO6 or UFO
are enabled or 1526 otherwise and parsing the chain includes overhead
which we pay even if the frame we are receiving is much smaller than
these sizes.

PR firecracker-microvm#4748 reduced
the overheads involved with parsing DescriptorChain objects. To further
avoid this overhead, move the parsing of DescriptorChain objects out of
the hot path of process_rx() where we are actually receiving a frame
into process_rx_queue_event() where we get the notification that the
guest added new buffers for network RX.

Co-authored-by: Babis Chalios <bchalios@amazon.es>
Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
Now, that we pre-process the buffers that guest provides for performing
RX, we need to save them in the VM state snapshot file, for networking
to work correctly post snapshot resume.

Implement Persist for RxBuffers and and plug them in the
(de)serialization logic of the network device.

Co-authored-by: Babis Chalios <bchalios@amazon.es>
Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
IoVecBufferMut type now uses IovDeque as its backing memory. IovDeque is
performing a custom memory allocation, using memfd_create() and a
combination of mmap() calls in order to provide a memory layout where
the iovec objects stored in the IovDeque will always be in consecutive
memory.

kani doesn't really get along with these system calls, which breaks our
proof for IoVecBufferMut::write_volatile_at. Substitute memory
allocation and deallocation with plain calls to std::alloc::(de)alloc
when we run kani proofs. Also provide a stub for IovDeque::push_back to
provide the same memory layout invariants.

Co-authored-by: Babis Chalios <bchalios@amazon.es>
Signed-off-by: Egor Lazarchuk <yegorlz@amazon.co.uk>
@ShadowCurse ShadowCurse requested a review from roypat October 14, 2024 08:38
@roypat roypat merged commit aa0dca9 into firecracker-microvm:main Oct 14, 2024
6 of 7 checks passed
@ShadowCurse ShadowCurse deleted the bchalios_readv branch October 15, 2024 15:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Awaiting review Indicates that a pull request is ready to be reviewed Type: Enhancement Indicates new feature requests Type: Performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants