-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for CNI 1.0.0 config #96
Conversation
Oops. I just noticed #90 which did update the internal structure to a later CNI version (and it looks like it was a pretty trivial change), but did not enable newer CNI versions to be parsed/output. So I guess the best of both worlds would be to combine them both? |
Weird. The two test suites failed on different tests, I also see the tests dumping byte arrays in "End Unit Test for case" which https://www.browserling.com/tools/ascii-to-text demonstrates are actually ASCII or UTF-8 JSON... |
Thanks @TBBle for the PR, recent commit has restored CI to match state on master. Appears we oscillate between failure on 2019 and failure on 2022. I can't repro the test failure on a local Server 2019, we should perhaps consider hosting our own runner and see whether that improves CI stability. cc: @debj1t @aznashwan please help review and see whether we can merge this then port your fixes over. |
LGTM FWIW, thank you!
There is indeed currently no logic in this repo which compares the network config provided to the plugins with the actual properties of the HCN network if it is pre-created. (it simply looks the network up by name and trusts it matches) IMO this should be fixed and the plugins should error on config mismatches with pre-existing networks, but that will be a separate patch for another day. I'll be following up with a patch which'll add a workflow to run containerd+cri integration tests on top of both the NAT and Bridge/Overlay plugins, so I'd suggest we merge this now @debj1t |
Makes sense. I was also planning to poke at nerdctl and see if I can work out why it created a faulty CNI config; I got rabbitholed trying to make sense of the duality of HCN/HNS and forgot about that. (I _suspect without actually checking that nerdctl just output a set of hard-coded details assuming the network doesn't already exist, rather than asking HCN for details of an existing network.) An alternative to this (which I thought I mentioned elsewhere, I think there's a discussion about CNI network name/type behaviour in another ticket) would be to actually complete the TODO in this repo and remove network-creation support, relying on the host to already have the desired network created. That would actually remove (I think...) the need to have multiple CNI network types in this codebase at all, since they all trust HCN over the user config anyway. |
You are correct,
Also correct, nerdctl does zero checking/creation of OS-side resources when the user creates a network (on Windows and Linux too afaik), all a
IMHO, it should be expected, and even downright warranted of the CNI plugins to create underlying OS resources as needed (the trivial reference Linux CNI plugins do this, at least), so from my PoV the real question is what the WinCNI plugins should do in case of the HNS networks pre-existing with different properties, for which I have opened the following issue for further discussion: #98
I hope to contribute a patch which should alleviate the An honest thank you for all your help in all of this, @TBBle! @debj1t looks like the updated test suite has passed (at least for 0.2.0 configs, which are the only ones we currently test) just fine on this PR so please consider merging this! |
|
To move to using the 1.0.0 structures internally? Replace the use of "github.com/containernetworking/cni/pkg/types/040" with "github.com/containernetworking/cni/pkg/types/100" and adapt to any differences in the structures in code in this repo that interacts with them, as far as I know. And yeah, nerdctl's code on Windows starts with a blank IPAM structure, and never populates the type field. The comparable code on Unix has different methods for different IPAM types which prepopulate the type field. I suspect the Windows-side nerdctl CNI generation implementation has been bitrotting a little, Edit: I also note that the current example nat.conf also does not specify an ipam type, and in fact excludes the whole field. Technically this example is invalid because IpamConfig's Type field is not marked |
Had already half-baked the reply to question 1 so posting it for posterity anyway:
If you meant moving the internal structures used by the plugins to 1.0.0, it's a matter of switching the
Indeed, the |
Signed-off-by: Paul "TBBle" Hampson <Paul.Hampson@Pobox.com>
The type is internally a byte array, which is not directly readable in the logs as its rendered as individual byte values. Per the CNI spec it's JSON, so almost certainly UTF-8. Signed-off-by: Paul "TBBle" Hampson <Paul.Hampson@Pobox.com>
This closes a path to a nil-dereference panic when test setup fails. Signed-off-by: Paul "TBBle" Hampson <Paul.Hampson@Pobox.com>
d42d304
to
db9cf71
Compare
// Supported CNI versions. | ||
var VersionsSupported = []string{"0.2.0", "0.3.0"} | ||
// Oldest Supported CNI versions | ||
var OldestVersionSupported = "0.2.0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@TBBle could we revert this to explicitly indicating supported versions? The current release versioning seems to correspond with greatest supported version. We can pull this change into a separate PR and discuss it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to be clear, you're requesting I open-code this list to the current return value of cniVers.VersionsStartingFrom(OldestVersionSupported)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I was poking around the CNI spec a little bit, and it's possible this change is faulty, as if I'm skimming it correctly, by declaring we support config files 0.4.0 or newer, we are also opening ourselves up to having CHECK
called for such config files, but we currently pass nil
as the check callback. A quick look at the CNI skeleton code doesn't show any checks for nil
in the relevant code, so this would panic if we hit it. (CHECK
was introduced in CNI spec 0.4.0 and it also changed the DEL
API; 0.4.0 to 1.0.0 looks like it only changed config and structures, according to the "major changes" list)
A quick look at BuildKit (my test case) doesn't show any calls to the go-cni Check
API which wraps this, so I haven't exercised this change in my limited testing. (And even briefer look in containerd's CRI implementation also shows no calls to Check
and no documented use-case)
I am not totally on top of how CNI works (and particularly the interactions of library version, spec version, skeleton version, and config version), so when I was building this PR I was mostly looking at the config-file parser, but perhaps we need to (for purposes of this PR) keep the existing VersionsSupported
array, and push the actual supported versions change out to require an implementation of CHECK
as well. (And probably a more-careful check of changes in the spec)
If I'm right and the version bump was incorrect, my understanding is we can still use the latest library (we were already on library 0.8.1 which appears to have supported spec 0.4.0), so we only need to revert our version declaration to avoid the behaviour change. We can also separately go ahead and upgrade the internal structures to types/100 or later in the meantime, as again, we are already using version types/040 with spec 0.3.0 support. (Again, might be worth validating in the spec that the meaning of those fields hasn't changed, in case something changed that isn't caught by simple Go compiler type-checking.)
Edit: I consider this a bug in the CNI Skeleton code, it probably should have either rejected the nil
function, or rejected attempts to declare spec 0.4.0 or higher as supported. I haven't checked, it might be that its code layout doesn't support this, i.e. potentially no part of the skeleton code except the CHECK
call itself has all of these details. That code should probably fail rather than panicking, at least.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CHECK is not implemented in containerd or cri-o. I added the check code in go-cni however discontinued it as it involves a significant amount of work to the runtime/cri-api/kubelet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So maybe just a function that returns a "not implemented" error would be sufficient to avoid the panic, in case that rework is done in the future, or another CNI user appears.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see the cni plugins callback function returns nil
which emulates success? not implemented
would allow us to catch unknown or future scenarios earlier though. @MikeZappa87 are we ok with adding a no-op check command callback + an issue to track the work to implement it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can proceed with returning "not implemented" for CHECK, and adding both 0.4.0 and 1.0.0 as the supported versions in the array. We can track the remaining work with the below issues based on requirement to support them (doesn't seem to be widely used at the moment):
- Implement CHECK; update config examples to include disableCheck field (0.4.0)
- Remove non-list configurations; remove version field of interfaces array (1.0.0)
Hi @TBBle |
@Lin-Huang-Wei note that the Example containerd setup script creating ID-less CNI network: |
Trivially upgraded github.com/containernetworking/cni to v1.1.2 and expanded the list of known versions to all the ones it supports for auto-conversion.
Specifically, the internal CNI representation is still using CNI spec 0.4.0, changing that would have required deeper changes. As long as any future CNI library updates can still convert to/from 0.4.0, we should be able to trivially upgrade CNI support using the CNI library auto-conversion. It might make sense to move to 1.0.0 internally, but that isn't necessary now.
Tested with the NAT plugin and the following CNI config, mostly generated by nerdctl 1.6.2. (I had to add the DNS config myself, and the gateway and subnet generated by nerdctl were incorrect for the existing NAT network created by (I think) Docker; both are things I'm chasing up elsewhere)