Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cilium-cli/connectivity: additionally check for container restarts #36299

Merged
merged 1 commit into from
Feb 17, 2025

Conversation

giorio94
Copy link
Member

@giorio94 giorio94 commented Dec 2, 2024

Extend the no-errors-in-logs check to additionally fail in case any of the Cilium-related pods have a container restart count greater than zero. This allows catching possible issues (e.g., panics) that subsequently resolve automatically, hence not preventing the successful completion of all the other tests.

@giorio94 giorio94 added release-note/ci This PR makes changes to the CI. cilium-cli This PR contains changes related with cilium-cli labels Dec 2, 2024
@github-actions github-actions bot added the cilium-cli-exclusive This PR only impacts cilium-cli binary label Dec 2, 2024
@giorio94 giorio94 force-pushed the pr/giorio94/main/cli-fail-on-restarts branch from 5e37784 to b905019 Compare December 2, 2024 17:15
@giorio94 giorio94 force-pushed the pr/giorio94/main/cli-fail-on-restarts branch 2 times, most recently from 6617765 to 92f072d Compare December 9, 2024 08:06
@giorio94
Copy link
Member Author

giorio94 commented Dec 9, 2024

/test

This comment was marked as outdated.

@github-actions github-actions bot added the stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale. label Jan 9, 2025

This comment was marked as outdated.

@github-actions github-actions bot closed this Jan 24, 2025
@giorio94 giorio94 reopened this Jan 27, 2025
@github-actions github-actions bot removed the stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale. label Jan 28, 2025
Extend the `no-errors-in-logs` check to additionally fail in case any of
the Cilium-related pods have a container restart count greater than zero.
This allows catching possible issues (e.g., panics) that subsequently
resolve automatically, hence not preventing the successful completion of
all the other tests.

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
@giorio94 giorio94 force-pushed the pr/giorio94/main/cli-fail-on-restarts branch from 92f072d to 932912a Compare February 11, 2025 17:15
@giorio94
Copy link
Member Author

/test

2 similar comments
@giorio94
Copy link
Member Author

/test

@giorio94
Copy link
Member Author

/test

@giorio94
Copy link
Member Author

Run the tests three times, and didn't observe any failure related to the new test. Marking ready for review.

The current implementation enables the check for stable versions as well. That looks reasonable to me considering that container restarts are typically fairly serious, but happy to put it behind a version check if reviewers feel more inclined so.

@giorio94 giorio94 marked this pull request as ready for review February 12, 2025 16:25
@giorio94 giorio94 requested review from a team as code owners February 12, 2025 16:25
@giorio94
Copy link
Member Author

@thorn3r Gentle ping 🙏

Copy link
Contributor

@thorn3r thorn3r left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice work 👍

@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Feb 17, 2025
@julianwiedmann julianwiedmann added this pull request to the merge queue Feb 17, 2025
@julianwiedmann
Copy link
Member

julianwiedmann commented Feb 17, 2025

(from past experience - I think it would have been safer to only roll this out for main, and then opt-in stable branches over time. Unless we've already evaluated that this works great for CI on all stable branches?)

Merged via the queue into main with commit f900d93 Feb 17, 2025
283 of 291 checks passed
@julianwiedmann julianwiedmann deleted the pr/giorio94/main/cli-fail-on-restarts branch February 17, 2025 16:35
@giorio94
Copy link
Member Author

(from past experience - I think it would have been safer to only roll this out for main, and then opt-in stable branches over time. Unless we've already evaluated that this works great for CI on all stable branches?)

I was honestly a bit on the fence as well on whether to enable it on all versions or main only:

The current implementation enables the check for stable versions as well. That looks reasonable to me considering that container restarts are typically fairly serious, but happy to put it behind a version check if reviewers feel more inclined so.

@julianwiedmann
Copy link
Member

(from past experience - I think it would have been safer to only roll this out for main, and then opt-in stable branches over time. Unless we've already evaluated that this works great for CI on all stable branches?)

I was honestly a bit on the fence as well on whether to enable it on all versions or main only:

The current implementation enables the check for stable versions as well. That looks reasonable to me considering that container restarts are typically fairly serious, but happy to put it behind a version check if reviewers feel more inclined so.

Looks like we caught some on v1.15 and v1.16 in ci-e2e-upgrade:
#37795
#37794

@giorio94
Copy link
Member Author

(from past experience - I think it would have been safer to only roll this out for main, and then opt-in stable branches over time. Unless we've already evaluated that this works great for CI on all stable branches?)

I was honestly a bit on the fence as well on whether to enable it on all versions or main only:

The current implementation enables the check for stable versions as well. That looks reasonable to me considering that container restarts are typically fairly serious, but happy to put it behind a version check if reviewers feel more inclined so.

Looks like we caught some on v1.15 and v1.16 in ci-e2e-upgrade: #37795 #37794

🤦, sorry, I should have tested this against all branches in advance 😭

Looking at a couple of failures, the agent seems to panic due to the following, which, well, seems fairly suspicious to me.

level=warning msg="Unable to update CiliumNode resource, will retry" error="Operation cannot be fulfilled on ciliumnodes.cilium.io \"kind-control-plane\": the object has been modified; please apply your changes to the latest version and try again" subsys=nodediscovery
level=warning msg="Unable to update CiliumNode resource, will retry" error="Operation cannot be fulfilled on ciliumnodes.cilium.io \"kind-control-plane\": the object has been modified; please apply your changes to the latest version and try again" subsys=nodediscovery
level=warning msg="Unable to update CiliumNode resource, will retry" error="Operation cannot be fulfilled on ciliumnodes.cilium.io \"kind-control-plane\": the object has been modified; please apply your changes to the latest version and try again" subsys=nodediscovery
level=warning msg="Unable to update CiliumNode resource, will retry" error="Operation cannot be fulfilled on ciliumnodes.cilium.io \"kind-control-plane\": the object has been modified; please apply your changes to the latest version and try again" subsys=nodediscovery
level=warning msg="Unable to update CiliumNode resource, will retry" error="Operation cannot be fulfilled on ciliumnodes.cilium.io \"kind-control-plane\": the object has been modified; please apply your changes to the latest version and try again" subsys=nodediscovery
level=warning msg="Unable to update CiliumNode resource, will retry" error="Operation cannot be fulfilled on ciliumnodes.cilium.io \"kind-control-plane\": the object has been modified; please apply your changes to the latest version and try again" subsys=nodediscovery
level=warning msg="Unable to update CiliumNode resource, will retry" error="Operation cannot be fulfilled on ciliumnodes.cilium.io \"kind-control-plane\": the object has been modified; please apply your changes to the latest version and try again" subsys=nodediscovery
level=warning msg="Unable to update CiliumNode resource, will retry" error="Operation cannot be fulfilled on ciliumnodes.cilium.io \"kind-control-plane\": the object has been modified; please apply your changes to the latest version and try again" subsys=nodediscovery
level=warning msg="Unable to update CiliumNode resource, will retry" error="Operation cannot be fulfilled on ciliumnodes.cilium.io \"kind-control-plane\": the object has been modified; please apply your changes to the latest version and try again" subsys=nodediscovery
level=warning msg="Unable to update CiliumNode resource, will retry" error="Operation cannot be fulfilled on ciliumnodes.cilium.io \"kind-control-plane\": the object has been modified; please apply your changes to the latest version and try again" subsys=nodediscovery
level=fatal msg="Could not create or update CiliumNode resource, despite retries" subsys=nodediscovery

@giorio94
Copy link
Member Author

I've opened #37823 to skip the check on v1.16 and earlier.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cilium-cli This PR contains changes related with cilium-cli cilium-cli-exclusive This PR only impacts cilium-cli binary ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/ci This PR makes changes to the CI.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants