-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
improve how CRs and k8s work with CNI plugins and cgroup drivers #15463
improve how CRs and k8s work with CNI plugins and cgroup drivers #15463
Conversation
Skipping CI for Draft Pull Request. |
/ok-to-test |
kvm2 driver with docker runtime
Times for minikube start: 54.5s 53.7s 54.7s 54.0s 53.5s Times for minikube ingress: 28.1s 27.2s 24.2s 28.2s 28.6s docker driver with docker runtime |
These are the flake rates of all failed tests.
Too many tests failed - See test logs for more details. To see the flake rates of all tests by environment, click here. |
I still see the failures... on Docker_linux |
kvm2 driver with docker runtime
Times for minikube ingress: 25.6s 24.1s 27.1s 25.7s 28.1s Times for minikube start: 56.0s 55.5s 54.1s 53.4s 53.7s docker driver with docker runtime
Times for minikube start: 27.5s 25.8s 25.5s 25.2s 25.7s Times for minikube ingress: 21.0s 19.9s 21.5s 19.9s 20.9s docker driver with containerd runtime
Times for minikube start: 21.7s 21.6s 24.2s 22.7s 21.1s Times for minikube ingress: 26.4s 43.4s 26.4s 24.4s 26.4s |
yep, i've created a draft pr to test initial fixes, some other issues surfaced - should be further improved with the commit i just made, we'll see |
kvm2 driver with docker runtime
Times for minikube ingress: 28.2s 27.7s 26.6s 23.7s 31.7s Times for minikube start: 54.0s 57.3s 54.9s 57.0s 54.7s docker driver with docker runtime
Times for minikube start: 25.7s 26.6s 26.6s 24.7s 26.1s Times for minikube ingress: 19.4s 20.4s 18.4s 21.0s 19.0s docker driver with containerd runtime
Times for minikube start: 22.3s 22.0s 21.5s 21.2s 21.2s Times for minikube ingress: 26.4s 26.4s 26.4s 26.4s 26.4s |
These are the flake rates of all failed tests.
Too many tests failed - See test logs for more details. To see the flake rates of all tests by environment, click here. |
These are the flake rates of all failed tests.
Too many tests failed - See test logs for more details. To see the flake rates of all tests by environment, click here. |
kvm2 driver with docker runtime
Times for minikube start: 54.9s 56.2s 53.7s 55.1s 57.4s Times for minikube ingress: 28.2s 27.1s 28.7s 28.6s 27.7s docker driver with docker runtime
Times for minikube ingress: 80.9s 20.0s 21.0s 19.5s 20.9s Times for minikube start: 28.3s 27.0s 25.7s 24.9s 28.8s docker driver with containerd runtime
Times for minikube ingress: 26.5s 26.4s 25.9s 26.4s 26.4s Times for minikube start: 21.6s 20.9s 22.3s 32.2s 22.3s |
ecd0bed
to
394302c
Compare
kvm2 driver with docker runtime
Times for minikube start: 53.4s 53.2s 54.4s 54.7s 54.8s Times for minikube ingress: 28.2s 28.2s 27.1s 27.2s 27.7s docker driver with docker runtime
Times for minikube (PR 15463) start: 27.6s 25.0s 28.3s 26.4s 25.7s Times for minikube ingress: 21.9s 20.4s 22.0s 81.0s 19.5s docker driver with containerd runtime
Times for minikube (PR 15463) ingress: 18.9s 32.5s 79.4s 31.4s 32.4s Times for minikube start: 32.9s 23.8s 24.7s 22.4s 24.3s |
These are the flake rates of all failed tests.
Too many tests failed - See test logs for more details. To see the flake rates of all tests by environment, click here. |
kvm2 driver with docker runtime
Times for minikube start: 54.6s 54.5s 55.4s 52.9s 54.0s Times for minikube ingress: 28.1s 27.7s 27.6s 24.7s 25.1s docker driver with docker runtime
Times for minikube ingress: 48.9s 21.5s 21.4s 20.9s 21.9s Times for minikube start: 24.6s 25.2s 25.6s 24.1s 24.7s docker driver with containerd runtime
Times for minikube start: 25.2s 32.7s 21.6s 21.4s 21.1s Times for minikube ingress: 26.4s 26.4s 26.4s 26.4s 25.9s |
These are the flake rates of all failed tests.
Too many tests failed - See test logs for more details. To see the flake rates of all tests by environment, click here. |
These are the flake rates of all failed tests.
Too many tests failed - See test logs for more details. To see the flake rates of all tests by environment, click here. |
Co-authored-by: Steven Powell <44844360+spowelljr@users.noreply.github.com>
kvm2 driver with docker runtime
Times for minikube start: 53.4s 55.5s 53.9s 56.7s 54.7s Times for minikube ingress: 27.6s 26.6s 27.6s 25.7s 29.6s docker driver with docker runtime
Times for minikube start: 25.6s 24.2s 29.2s 28.9s 30.1s Times for minikube ingress: 19.9s 20.4s 20.9s 21.4s 20.0s docker driver with containerd runtime
Times for minikube start: 21.6s 25.3s 21.4s 22.1s 20.8s Times for minikube (PR 15463) ingress: 31.9s 79.9s 30.4s 33.4s 48.0s |
hmmm, we again had the same issue as seen earlier today - network issues?
|
kvm2 driver with docker runtime
Times for minikube start: 54.3s 54.4s 55.3s 55.3s 56.2s Times for minikube ingress: 29.2s 25.7s 29.6s 29.2s 22.7s docker driver with docker runtime
Times for minikube start: 24.6s 27.5s 27.9s 25.5s 25.0s Times for minikube ingress: 50.0s 19.0s 19.9s 20.9s 23.0s docker driver with containerd runtime
Times for minikube start: 33.0s 22.3s 22.3s 22.2s 21.7s Times for minikube ingress: 25.9s 27.4s 26.4s 26.4s 26.4s |
These are the flake rates of all failed tests.
To see the flake rates of all tests by environment, click here. |
/retest-this-please |
kvm2 driver with docker runtime
Times for minikube ingress: 28.7s 28.1s 27.7s 27.7s 28.1s Times for minikube start: 55.3s 55.5s 54.2s 54.9s 54.8s docker driver with docker runtime
Times for minikube (PR 15463) ingress: 22.9s 23.0s 20.4s 81.5s 20.5s Times for minikube start: 25.7s 25.3s 26.0s 26.1s 26.0s docker driver with containerd runtime
Times for minikube start: 21.2s 21.4s 21.4s 21.9s 22.4s Times for minikube ingress: 25.9s 26.4s 25.9s 26.5s 26.4s |
These are the flake rates of all failed tests.
To see the flake rates of all tests by environment, click here. |
kvm2 driver with docker runtime
Times for minikube start: 56.1s 55.9s 54.3s 56.5s 55.1s Times for minikube ingress: 27.7s 28.7s 27.7s 25.7s 27.7s docker driver with docker runtime
Times for minikube start: 27.4s 26.4s 25.4s 25.6s 27.9s Times for minikube ingress: 20.4s 21.4s 50.0s 82.4s 21.5s docker driver with containerd runtime
Times for minikube start: 32.3s 22.1s 22.4s 21.8s 21.3s Times for minikube ingress: 26.5s 26.0s 26.0s 26.4s 26.5s |
These are the flake rates of all failed tests.
Too many tests failed - See test logs for more details. To see the flake rates of all tests by environment, click here. |
kvm2 driver with docker runtime
Times for minikube start: 55.9s 53.9s 56.8s 59.3s 52.2s Times for minikube ingress: 27.2s 24.7s 24.6s 26.2s 28.6s docker driver with docker runtime
Times for minikube start: 24.6s 26.6s 25.6s 25.9s 28.8s Times for minikube ingress: 81.5s 20.0s 22.5s 19.0s 20.5s docker driver with containerd runtime
Times for minikube ingress: 25.9s 26.0s 26.4s 26.5s 26.5s Times for minikube start: 21.6s 22.0s 24.3s 21.2s 25.2s |
These are the flake rates of all failed tests.
Too many tests failed - See test logs for more details. To see the flake rates of all tests by environment, click here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you very much for your large undertaking to improving minikube as a whole, it is very much appreciated!
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: prezha, spowelljr The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
<update>
initial goal
reduce flakiness of Docker_Linux tests
initial analysis & conclusion
tests are actually mostly ok, we have some real issues
redefined goal
find & fix the issues
outcome
a bit more details & context
i've spent some time trying to figure out why all the tests are consistently passing on my machine but then just failing when run on ci/jenkins
spoiler: as it turns out, it all mostly boils down to inconsistent cgroups across the "stack" and also how CNIs and CRs play (or not!) together
part of the conclusion from that "investigation" was completely unexpected and surprising (to me, at least) discovery that we have ci/jenkins agents with different configurations! here the difference is not just in ServerVersion but also in the CgroupDriver used inside the os (ie, how the agent machine was booted) - examples:
Docker_Linux-26999:
Docker_Linux-27065:
while i couldn't change/affect the ServerVersion, i went on and made minikube auto-adaptable to the underlying cgroup(s) driver, and that helped eliminate some of the "flakiness" but also hopefully made minikube more "robust" in terms of the "flavours" of os/settings our users have
sounds like good timing as well, since k8s now recommends using systemd as the default cgroup driver, and with this, we shouldn't have much to change additionally going forward
code fixes and improvements (candidates for individual PRs - next steps)
/etc/cni/net.mk
dir for CRs and CNIs (note: that dir should be removed from our "distro", as it confuses containerd thinking it's default one to look into; until then, there's a "hack" that removes it)hacks (that should be replaced with proper "distribution" updates!)
used to eliminate some of the "known issues" with the upstreams:
a number of other minor tweaks, additions and fixes
eg, timeouts, juju packages updates, comments, docs, spellings, etc.
tests fixes and improvements
=> here, the issue actually might be in os.Setenv("MOCK_GOOGLE_TOKEN", "true") and we just need to address that - /~https://github.com/GoogleContainerTools/gcp-auth-webhook#gcp-auth-webhook:
general notes
</update>
the goal of this pr is to see if we can reduce the rate of errors and flakes with the TestNetworkPlugins test group with the linux+docker combo
locally, these tests all pass (TestNetworkPlugins-linux+docker.log), and if ci_cd tests show similar, we might want to breakdown these into separate PRs
key points:
race condition
overwriting minikube's certs (should fix relevant flake with other tests as well)unauthorised
(as a consequence of the previous point) is a non-retryable error - fail fast w/o retryingexample:
just to fail afterwards:
cri-o bridge
for all CNIs as it interferes with othersloopback
a name (as reported by @STRRL in could not bootstrap kubernetes cluster with CNI flannel #14965 with the pr still pending in cri-o) - example of errors resolved:and
kubenet
also needs the cni (eg, bridge) to support hairpin modecalico
to the latest versionflannel
to the latest version and extracted k8s manifests from code to a separate file that's then embeddedtimeout for k8s binaries
memory
andwait timeout
to avoid weird issues with constraint resourcesjuju
packages to the latest version - ours are ~3 years old