Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add bonding and vlan tagging support to Bottlerocket #2596

Merged
merged 4 commits into from
Dec 13, 2022

Conversation

yeazelm
Copy link
Contributor

@yeazelm yeazelm commented Nov 19, 2022

Issue number:
Closes #2369

Description of changes:
This will add bonding and vlan tagging support to bottlerocket. There is a lot of code here mostly because we needed to bump the configuration version to 3 and refactor a bunch of logic since there are now different types (kinds) of devices which have different validation logic, testing concerns, etc. This should allow us to add more virtual network devices in the future without so much refactoring. It also moves some of the device-specific code into a new devices module for easier understanding of what is a net_config concern and what is a "network device" concern.

Testing done:
I've validated this on hardware and qemu. (More testing will come as the PR progresses) The tests all pass as well right now. I also ran clippy to cover those bases as well.

Terms of contribution:

By submitting this pull request, I agree that this contribution is dual-licensed under the terms of both the Apache License, version 2.0, and the MIT license.

@yeazelm yeazelm requested review from bcressey, zmrow and webern November 19, 2022 02:31
Copy link
Contributor

@stmcginnis stmcginnis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great overall. A few minor comments. Should also clean up some of the commented out code before we merge. Nice work!

PROVISIONING-METAL.md Outdated Show resolved Hide resolved
PROVISIONING-METAL.md Outdated Show resolved Hide resolved
PROVISIONING-METAL.md Show resolved Hide resolved
Copy link
Member

@stockholmux stockholmux left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few bits and bobs on the markdown.

PROVISIONING-METAL.md Show resolved Hide resolved
PROVISIONING-METAL.md Outdated Show resolved Hide resolved
Copy link
Contributor Author

@yeazelm yeazelm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the PR from the comments so far plus tidied up some dead code.

sources/api/netdog/src/net_config/devices/mod.rs Outdated Show resolved Hide resolved
sources/api/netdog/src/wicked/bonding.rs Outdated Show resolved Hide resolved
PROVISIONING-METAL.md Outdated Show resolved Hide resolved
PROVISIONING-METAL.md Outdated Show resolved Hide resolved
PROVISIONING-METAL.md Show resolved Hide resolved
PROVISIONING-METAL.md Show resolved Hide resolved
PROVISIONING-METAL.md Outdated Show resolved Hide resolved
@yeazelm yeazelm force-pushed the bonding_vlan branch 4 times, most recently from 72e4d43 to 2a8e6ed Compare November 22, 2022 16:30
PROVISIONING-METAL.md Outdated Show resolved Hide resolved
sources/api/netdog/src/wicked/mod.rs Outdated Show resolved Hide resolved
Copy link
Contributor

@zmrow zmrow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work!

PROVISIONING-METAL.md Outdated Show resolved Hide resolved
PROVISIONING-METAL.md Outdated Show resolved Hide resolved
PROVISIONING-METAL.md Show resolved Hide resolved
PROVISIONING-METAL.md Outdated Show resolved Hide resolved
PROVISIONING-METAL.md Outdated Show resolved Hide resolved
sources/api/netdog/src/net_config/devices/bonding.rs Outdated Show resolved Hide resolved
sources/api/netdog/src/net_config/devices/bonding.rs Outdated Show resolved Hide resolved
sources/api/netdog/src/net_config/mod.rs Outdated Show resolved Hide resolved
sources/api/netdog/src/wicked/mod.rs Outdated Show resolved Hide resolved
sources/api/netdog/src/wicked/vlan.rs Outdated Show resolved Hide resolved
@yeazelm yeazelm force-pushed the bonding_vlan branch 2 times, most recently from 3b9b143 to ae2e422 Compare November 28, 2022 21:22
@yeazelm yeazelm force-pushed the bonding_vlan branch 3 times, most recently from 7e1ce18 to dc2ac0f Compare November 29, 2022 16:54
Copy link
Contributor

@webern webern left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to save a few thoughts and move on to other things. I'll continue with this PR when I can. Nice work!

sources/api/netdog/src/wicked/vlan.rs Show resolved Hide resolved
sources/api/netdog/src/net_config/devices/bonding.rs Outdated Show resolved Hide resolved
sources/api/netdog/src/net_config/devices/bonding.rs Outdated Show resolved Hide resolved
sources/api/netdog/src/net_config/devices/bonding.rs Outdated Show resolved Hide resolved
sources/api/netdog/src/net_config/devices/bonding.rs Outdated Show resolved Hide resolved
PROVISIONING-METAL.md Outdated Show resolved Hide resolved
sources/api/netdog/src/wicked/mod.rs Outdated Show resolved Hide resolved
sources/api/netdog/src/net_config/v3.rs Outdated Show resolved Hide resolved
sources/api/netdog/src/net_config/v3.rs Show resolved Hide resolved
Comment on lines 43 to 48
impl HasStatic for &NetVlanV1 {
fn has_static(&self) -> bool {
self.static4.is_some() || self.static6.is_some()
}

fn validate_static4(&self) -> Result<()> {
if let Some(config) = &self.static4 {
config.validate()?
}
Ok(())
}

fn validate_static6(&self) -> Result<()> {
if let Some(config) = &self.static6 {
config.validate()?
}
Ok(())
}
}

impl HasDhcp for &NetVlanV1 {
fn has_dhcp(&self) -> bool {
self.dhcp4.is_some() || self.dhcp6.is_some()
}
}

impl HasRoutes for &NetVlanV1 {
fn has_routes(&self) -> bool {
self.routes.is_some()
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me, all these helper traits are hurting the clarity of the code.

I'd recommend looking at a derive macro that can be reused across the types, so you can just write:

#[derive(Debug, Deserialize, Validate)]
pub(crate) struct NetVlanV1 {
}

Then reviewers can focus on two things:

  • do all the types that need to derive Validate?
  • does the single implementation of Validate cover all the combinations?

Right now, it's more like this:

  • is HasRoutes implemented the same way for NetInterfaceV2, NetVlanV1, NetBondV1?
  • is HasStatic ... ?
  • is HasDhcp ... ?
  • do all the structs that implement all three also call validate_addressing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fair, I'll look at implementing a derive macro.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't go as far as a derive, but I did clean up the code and move most of this boiler plate into a single trait that is generated for those devices that need it. So now someone should only need to reason that this has been added to the right places and the macro correctly generates the logic required to validate IP Addressing.

sources/api/netdog/src/wicked/bonding.rs Outdated Show resolved Hide resolved
sources/api/netdog/src/net_config/devices/bonding.rs Outdated Show resolved Hide resolved
sources/api/netdog/src/net_config/devices/vlan.rs Outdated Show resolved Hide resolved
@yeazelm yeazelm force-pushed the bonding_vlan branch 2 times, most recently from f41c607 to 740cd11 Compare December 7, 2022 04:57
PROVISIONING-METAL.md Outdated Show resolved Hide resolved
PROVISIONING-METAL.md Outdated Show resolved Hide resolved
[bond1.monitoring]
arpmon-interval = 200
arpmon-validate = "all"
arpmon-targets = ["192.168.1.1", "10.0.0.2"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these always ipv4? Should you have one ipv6 value in the test data here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I believe they could be IPv6 as well, but I just never saw any examples. I can probably confirm that is valid but the kernel docs only use IPv4 (which might just date when the doc was written).

PROVISIONING-METAL.md Outdated Show resolved Hide resolved
* `miimon-downdelay-ms` (int in ms): MII Monitoring delay before the link is disabled after link is no longer detected in milliseconds
* `arpmon-interval-ms` (int in ms): Number of milliseconds between intervals to determine link status, must be greater than 0
* `arpmon-validate-ms` (enum of `all`, `none`, `active`, `backup`): What packets should be used to validate link
* `arpmon-targets-ms` (list of quoted IPv4 address including prefix): List of targets to use for validating ARP. Min = 1, Max = 16
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, am I not understanding what the ms suffix means? I thought it was milliseconds.

Suggested change
* `arpmon-targets-ms` (list of quoted IPv4 address including prefix): List of targets to use for validating ARP. Min = 1, Max = 16
* `arpmon-targets` (list of quoted IPv4 address including prefix): List of targets to use for validating ARP. Min = 1, Max = 16

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above, was too aggressive.

}
}

#[derive(Clone, Debug, Deserialize)]
Copy link
Contributor

@webern webern Dec 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are probably a few enums like this in the PR. It can be handy when writing code if "simple" enums implement Copy since they are nothing but an integer under the hood.

Suggested change
#[derive(Clone, Debug, Deserialize)]
#[derive(Clone, Copy, Debug, Deserialize)]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I'll take a look at adding them to the Enums that are "simple" like you call out.

self.addresses.iter().all(|a| matches!(a, IpNet::V4(_)))
|| self.addresses.iter().all(|a| matches!(a, IpNet::V6(_))),
InvalidNetConfigSnafu {
reason: "static configuration must only contain all IPv4 or all IPv6 addresses"
Copy link
Contributor

@webern webern Dec 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this answers a question I asked elsewhere.

In general I'm sad to see all of this validation code. And I'm sure I was asked for help with this and probably didn't come up with anything, but there's always a way to avoid "validate after construction", especially with Rust. Just something to keep in mind in future designs.

(Also, I realize that making it compatible with the serialization shape is challenging)

Instead of Vec<V4orV6Enum>...

enum IpV4OrIpV6List {
    IpV4(Vec<IpV4Struct>),
    IpV6(Vec<IpV6Struct>),
}

Which is one example of eliminating a validate() call.

Comment on lines +9 to +10
#[test]
fn ok_config() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generating test code like this confounds IDE tooling. Just something to keep in mind in the future. One cannot step through code or click on the green "run this test" button with generated code.

That being said, I love the fact that so much testing is taking place and the test configuration files serve as excellent documentation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, CLion does do some interesting things where you can view the generated file on some failures, I'm not sure how I got it doing it, but it did help. It's not the most ergonomic experience when dealing with these tests but I also don't think I would rather have the alternative. And its great to have tests that can map across versions pretty easily as we move forward.

sources/api/netdog/src/net_config/v3.rs Show resolved Hide resolved
}
}

#[allow(clippy::to_string_in_format_args)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you delete the .to_string() calls do you get some other clippy message? If so that must be a false positive (assuming the types implement Display).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it complains differently about this:

error[E0277]: `interface_name::InterfaceName` doesn't implement `std::fmt::Display`
  --> api/netdog/src/net_config/v3.rs:81:33
   |
81 | ...                   interface,
   |                       ^^^^^^^^^ `interface_name::InterfaceName` cannot be formatted with the default formatter
   |
   = help: the trait `std::fmt::Display` is not implemented for `interface_name::InterfaceName`
   = note: in format strings you may be able to use `{:?}` (or {:#?} for pretty-print) instead
   = note: this error originates in the macro `$crate::__export::format_args` which comes from the expansion of the macro `format` (in Nightly builds, run with -Z macro-backtrace for more info)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, the "right" thing to do would be to implement Display for interface_name::InterfaceName instead of implementing a to_string function (if that's what you did)

PROVISIONING-METAL.md Outdated Show resolved Hide resolved
PROVISIONING-METAL.md Outdated Show resolved Hide resolved
PROVISIONING-METAL.md Show resolved Hide resolved
sources/api/netdog/src/net_config/devices/bonding.rs Outdated Show resolved Hide resolved
PROVISIONING-METAL.md Outdated Show resolved Hide resolved
PROVISIONING-METAL.md Outdated Show resolved Hide resolved
@yeazelm yeazelm force-pushed the bonding_vlan branch 3 times, most recently from 068372d to 68ad650 Compare December 13, 2022 18:26
The net.toml version is bumped to 3 to accommodate the new functionality
of bonding and vlans. Version 3 does require different naming than
version 2 so version net.toml files are not drop in ready for version 3.
This also specifies which options are support for bonding and vlans.

Bonding is a complex topic and right now, we are making a deliberate
choice to support only a subset of bonding before full support is
enabled. This mostly comes in the form of only mode 1 (active-primary).

VLAN tagging is supported by creating a virtual network device on top of
another device. This allows mixing and matching of vlan tags with
whatever configuration is required. A user can choose to only configure
VLAN Tagged devices for addressing if that is required, or add in vlan
tagging on a device for special handling.
This adds the structures and tests that are used to support bonding in
net.toml. This will be enabled via a version 3 of the net_config module
in a future commit. The current implementation only supports mode 1
(active-backup) for bonding and a handful of settings beyond the minimum
set required for bonding. The code is ready for more modes and settings
to be added when the need arises.
This adds the stuctures to specify a vlan in net.toml. This allows
deserialization and serialization of vlan structures along with some
tests.
This adds version 3 of network configuration along with bonding and vlan
support end to end from net.toml to wicked xml. The support now works as
intended for both vlan virtual devices and bonding devices on metal.
This also moves around some of the device files into a new module so
that future expansion of devices is seperated from expansion of network
configuration versions. This also changes the validation logic to add in
traits that allow a generic IP Addressing validation function to run on
any network device that accepts addressing.
Copy link
Contributor

@zmrow zmrow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work!

Let's open an issue re: better error messaging when deserializing untagged enums. Pulling that in will allow us to move all validation into Deserialize implementations and remove the Validate trait since our error messages won't be masked by the top-level enum deserialize.

@zmrow zmrow changed the title Add bonding and vlan tagging support to bottlerocket Add bonding and vlan tagging support to Bottlerocket Dec 13, 2022
@yeazelm
Copy link
Contributor Author

yeazelm commented Dec 13, 2022

I added #2657 to track the work to fix the error messaging for enums so we can move all the rest of the validation into the Deserialization.

@yeazelm yeazelm merged commit f3dfa53 into bottlerocket-os:develop Dec 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Metal: support network bonding
7 participants