Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🌱 Add MP back to dualstack E2E test #10135

Merged
merged 1 commit into from
Apr 18, 2024

Conversation

willie-yao
Copy link
Contributor

What this PR does / why we need it:
This PR adds MachinePools back to the dualstack E2E test and fixes the issue found in #9477

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes part of #10028

/area clusterclass
/area e2e-testing

@k8s-ci-robot k8s-ci-robot added area/clusterclass Issues or PRs related to clusterclass area/e2e-testing Issues or PRs related to e2e testing cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Feb 9, 2024
@willie-yao
Copy link
Contributor Author

/test pull-cluster-api-e2e-full-dualstack-and-ipv6-main

@k8s-ci-robot
Copy link
Contributor

@willie-yao: The specified target(s) for /test were not found.
The following commands are available to trigger required jobs:

  • /test pull-cluster-api-build-main
  • /test pull-cluster-api-e2e-blocking-main
  • /test pull-cluster-api-e2e-conformance-ci-latest-main
  • /test pull-cluster-api-e2e-conformance-main
  • /test pull-cluster-api-e2e-dualstack-and-ipv6-main
  • /test pull-cluster-api-e2e-main
  • /test pull-cluster-api-e2e-mink8s-main
  • /test pull-cluster-api-e2e-upgrade-1-29-1-30-main
  • /test pull-cluster-api-test-main
  • /test pull-cluster-api-test-mink8s-main
  • /test pull-cluster-api-verify-main

The following commands are available to trigger optional jobs:

  • /test pull-cluster-api-apidiff-main

Use /test all to run the following jobs that were automatically triggered:

  • pull-cluster-api-apidiff-main
  • pull-cluster-api-build-main
  • pull-cluster-api-e2e-blocking-main
  • pull-cluster-api-test-main
  • pull-cluster-api-verify-main

In response to this:

/test pull-cluster-api-e2e-full-dualstack-and-ipv6-main

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@willie-yao
Copy link
Contributor Author

I'm a bit confused why /pull-cluster-api-e2e-dualstack-and-ipv6-main is running the entire E2E test suite, but the most recent failure seems to come from a non-dualstack test. @killianmuldoon is this the same error you observed in #9477? The previous two runs passed, but I will run this test a few more times.

@willie-yao
Copy link
Contributor Author

Okay I actually got an error from the ipv6 test on the fourth run with this error: Unable to run conformance tests: error container run failed with exit code 1. Not sure if this is the failure that was observed before.

@willie-yao
Copy link
Contributor Author

@killianmuldoon I've retested the dualstack test around 8 times and it only failed once on the dualstack spec (other failures were flakes on other specs). What was the original error that caused the dualstack tests to fail? This is the test run that failed: https://prow.k8s.io/view/gs/kubernetes-jenkins/pr-logs/pull/kubernetes-sigs_cluster-api/10135/pull-cluster-api-e2e-dualstack-and-ipv6-main/1757196147123818496

@chrischdi
Copy link
Member

/test pull-cluster-api-e2e-conformance-ci-latest-main

@chrischdi
Copy link
Member

/test pull-cluster-api-e2e-conformance-main

@killianmuldoon
Copy link
Contributor

Sorry I've been missing the updated here!

For context the dualstack tests runs the full CAPI tests suite - the same as e2e full - and adds a couple of dualstack conformance tests on top. I can't remember the source of the failures in #9477 unfortunately, and I'm not sure why it's not written down anywhere 😅.

I will say looking at the history for this PR - https://prow.k8s.io/pr-history/?org=kubernetes-sigs&repo=cluster-api&pr=10135 - this looks way too flaky to merge.

If I'm right then there's something in how we're running these tests that means they only fail when we touch the non-dualstack network of the MachinePools in some way.

@willie-yao
Copy link
Contributor Author

@killianmuldoon No problem, thanks for the update! I saw that the dualstack test was a bit flaky on this PR, but the same error did not show up twice and it looks like the conformance tests also passed. I think @fabriziopandini mentioned in the office hours that they did a few fixes to the MPs that may have fixed the issue. I'll continue to run these tests and see if there's anything reoccuring.

@killianmuldoon
Copy link
Contributor

AFAIR it's not just the conformance tests that are the problem in this instance - it's also down to which CAPI components end up on which nodes.

One idea is to set up a repeating test - maybe five runs - and see if this change can pass that.

@willie-yao
Copy link
Contributor Author

/test pull-cluster-api-e2e-dualstack-and-ipv6-main
/test pull-cluster-api-e2e-conformance-ci-latest-main
/test pull-cluster-api-e2e-conformance-main

@killianmuldoon
Copy link
Contributor

@willie-yao - I think there are code changes needed on this one. If you'd like to organise something we can try to get to the bottom of it a bit by going through some of the logs etc.

@willie-yao
Copy link
Contributor Author

@killianmuldoon Sure thing, I'd be happy to pair on this. Since the same errors existed without MachinePools, I'm a bit unsure where to look to fix this as I'm not too familiar with the dualstack implementation. Is there somewhere I should look to get a better idea for this?

@killianmuldoon
Copy link
Contributor

I think the best place would be to debug through the logs of the failed jobs to understand exactly what's going on when they fial

@jackfrancis
Copy link
Contributor

@willie-yao could you post some equivalent dualstack test failures that occur on clusters without MachinePools?

@sbueringer
Copy link
Member

sbueringer commented Mar 25, 2024

Just a general comment. I think we probably should get the tests green on main first before introducing another potential source of errors.

Except if we feel confident the new MP part is stable and won't add more flakiness
(Otherwise the test becomes harder to fix)

@willie-yao
Copy link
Contributor Author

Just a general comment. I think we probably should get the tests green on main first before introducing another potential source of errors.

I definitely agree. I'm seeing the same flakes on this PR as the ones that show up on main. It does seem like adding MPs makes it more flaky, but the reason is not super clear to me.

@willie-yao
Copy link
Contributor Author

Update: We decided in office hours that we should fix the flakes in the dualstack test for MachineDeployments before implementing them for MachinePools. Therefore, the dualstack test for MachinePools will be non-blocking for MachinePool graduation.

@k8s-ci-robot
Copy link
Contributor

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 9, 2024
@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Apr 15, 2024
@k8s-ci-robot k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Apr 15, 2024
@willie-yao
Copy link
Contributor Author

/test pull-cluster-api-e2e-dualstack-and-ipv6-main
/test pull-cluster-api-e2e-conformance-ci-latest-main
/test pull-cluster-api-e2e-conformance-main

1 similar comment
@sbueringer
Copy link
Member

/test pull-cluster-api-e2e-dualstack-and-ipv6-main
/test pull-cluster-api-e2e-conformance-ci-latest-main
/test pull-cluster-api-e2e-conformance-main

@chrischdi
Copy link
Member

chrischdi commented Apr 16, 2024

Different/unrelated flake: #10356

No Control Plane machines came into existence.

https://prow.k8s.io/view/gs/kubernetes-jenkins/pr-logs/pull/kubernetes-sigs_cluster-api/10135/pull-cluster-api-e2e-dualstack-and-ipv6-main/1780103397202989056

/test pull-cluster-api-e2e-dualstack-and-ipv6-main
/test pull-cluster-api-e2e-conformance-ci-latest-main
/test pull-cluster-api-e2e-conformance-main

@sbueringer
Copy link
Member

I would suggest let's merge this PR. And once this was also stable for a bit (maybe a week?). Let's continue with merging dualstack & e2e-main?

(cc @killianmuldoon)

I think we should be mostly good, just want to do it incrementally and another week shouldn't hurt.

@killianmuldoon
Copy link
Contributor

sounds good to me

@sbueringer
Copy link
Member

(Or depending on how confident we are in this PR, maybe we'll do it the other way around)

@sbueringer
Copy link
Member

sbueringer commented Apr 16, 2024

This might be a MachinePool-specific issue: When testing Cluster API working on self-hosted clusters using ClusterClass [ClusterClass] Should pivot the bootstrap cluster to a self-hosted cluster

https://prow.k8s.io/view/gs/kubernetes-jenkins/pr-logs/pull/kubernetes-sigs_cluster-api/10135/pull-cluster-api-e2e-dualstack-and-ipv6-main/1780141833020510208

@jackfrancis
Copy link
Contributor

@sbueringer is your comment about pivoting to a self-hosted cluster feedback for a potential chante to this PR?

@sbueringer
Copy link
Member

@sbueringer is your comment about pivoting to a self-hosted cluster feedback for a potential chante to this PR?

I was just quoting the name of the failed test 😃

@sbueringer
Copy link
Member

sbueringer commented Apr 17, 2024

Ah wait, this PR doesn't influence that test. Or at least not directly

@sbueringer
Copy link
Member

/test pull-cluster-api-e2e-dualstack-and-ipv6-main
/test pull-cluster-api-e2e-conformance-ci-latest-main
/test pull-cluster-api-e2e-conformance-main

2 similar comments
@sbueringer
Copy link
Member

/test pull-cluster-api-e2e-dualstack-and-ipv6-main
/test pull-cluster-api-e2e-conformance-ci-latest-main
/test pull-cluster-api-e2e-conformance-main

@sbueringer
Copy link
Member

/test pull-cluster-api-e2e-dualstack-and-ipv6-main
/test pull-cluster-api-e2e-conformance-ci-latest-main
/test pull-cluster-api-e2e-conformance-main

@sbueringer
Copy link
Member

I think I would go ahead and merge it

@killianmuldoon @chrischdi WDYT?

@chrischdi
Copy link
Member

ok for me :-)

The last failure seems to be unrelated to this PR (although I cannot find it happening in the past).

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 17, 2024
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 5de96bb21ff12366a864b41095eb8d36f4b66646

@sbueringer sbueringer changed the title 🌱 WIP: Add MP back to dualstack E2E test 🌱 Add MP back to dualstack E2E test Apr 17, 2024
@chrischdi
Copy link
Member

/test pull-cluster-api-e2e-dualstack-and-ipv6-main

For a green run

@sbueringer
Copy link
Member

/approve

Thx @willie-yao !!

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: sbueringer

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 18, 2024
@sbueringer
Copy link
Member

Let's go ahead with merging the two jobs

@k8s-ci-robot k8s-ci-robot merged commit d098cb1 into kubernetes-sigs:main Apr 18, 2024
29 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v1.8 milestone Apr 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/clusterclass Issues or PRs related to clusterclass area/e2e-testing Issues or PRs related to e2e testing cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants