Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add NodeRegistrationHealthy status condition to nodepool #1969

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

jigisha620
Copy link
Contributor

Fixes #N/A

Description
This PR adds NodeRegistrationHealthy status condition to nodePool which indicates if a misconfiguration exists that is preventing successful node launch/registrations that requires manual investigation.

How was this change tested?
Added tests

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Feb 6, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: jigisha620
Once this PR has been reviewed and has the lgtm label, please assign tzneal for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot
Copy link
Contributor

Hi @jigisha620. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Feb 6, 2025
@coveralls
Copy link

coveralls commented Feb 6, 2025

Pull Request Test Coverage Report for Build 13553912339

Details

  • 107 of 137 (78.1%) changed or added relevant lines in 8 files are covered.
  • 3 unchanged lines in 2 files lost coverage.
  • Overall coverage decreased (-0.02%) to 81.62%

Changes Missing Coverage Covered Lines Changed/Added Lines %
pkg/controllers/controllers.go 0 1 0.0%
pkg/controllers/nodeclaim/lifecycle/liveness.go 17 25 68.0%
pkg/controllers/nodeclaim/lifecycle/registration.go 13 21 61.9%
pkg/controllers/nodepool/registrationhealth/controller.go 31 44 70.45%
Files with Coverage Reduction New Missed Lines %
pkg/controllers/nodepool/readiness/controller.go 1 71.15%
pkg/controllers/node/termination/controller.go 2 72.08%
Totals Coverage Status
Change from base Build 13548001391: -0.02%
Covered Lines: 9574
Relevant Lines: 11730

💛 - Coveralls

@jigisha620 jigisha620 force-pushed the degraded-nodepool-implementation branch 4 times, most recently from d060e31 to 659e4dd Compare February 6, 2025 22:28
@jigisha620 jigisha620 force-pushed the degraded-nodepool-implementation branch 2 times, most recently from 03a7d60 to 3fd43cf Compare February 12, 2025 02:13
// If the nodeClaim failed to launch/register during the TTL set NodeRegistrationHealthy status condition on
// NodePool to False. If the launch failed get the launch failure reason and message from nodeClaim.
if nodeClaim.StatusConditions().IsTrue(v1.ConditionTypeLaunched) {
nodePool.StatusConditions().SetFalse(v1.ConditionTypeNodeRegistrationHealthy, "Unhealthy", "Failed to register node")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the reason is RegistrationFailed. I'm also not sure if instead of that message we should try and make recommendations of things to double check.

@jigisha620 jigisha620 force-pushed the degraded-nodepool-implementation branch from 3fd43cf to d202c66 Compare February 12, 2025 19:05
@jigisha620 jigisha620 changed the title chore: add NodeRegistrationHealthy status condition to nodepool feat: add NodeRegistrationHealthy status condition to nodepool Feb 12, 2025
@jigisha620 jigisha620 force-pushed the degraded-nodepool-implementation branch 3 times, most recently from 5df485a to 67ea6d0 Compare February 12, 2025 22:09
@jigisha620 jigisha620 force-pushed the degraded-nodepool-implementation branch from bc920ec to 7f356d4 Compare February 13, 2025 21:52
@jigisha620 jigisha620 force-pushed the degraded-nodepool-implementation branch from 7f356d4 to a9d685a Compare February 17, 2025 20:48
@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Feb 17, 2025
@jigisha620 jigisha620 force-pushed the degraded-nodepool-implementation branch from a9d685a to 810ad69 Compare February 20, 2025 00:28
}

func (c *Controller) Reconcile(ctx context.Context, nodePool *v1.NodePool) (reconcile.Result, error) {
ctx = injection.WithControllerName(ctx, "nodepool.registrationhealth")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we check whether the NodePool is managed in this Reconcile to match our Predicate or are you wanting to handle that in the GetNodeClass() call

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to handle that in the the GetNodeClass() call. That should do this for us.

Expect(nodePool.StatusConditions().Get(v1.ConditionTypeNodeRegistrationHealthy).IsUnknown()).To(BeTrue())
Expect(nodePool.Status.NodeClassObservedGeneration).To(Equal(int64(1)))
})
It("should not set NodeRegistrationHealthy status condition on nodePool as Unknown if it is already set to true", func() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure that I get this test -- why would we set the status condition to Unknown here -- all of the generation details match so I don't see our controller doing anything

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wasn't that what you meant here?

Can we have a check for making sure that we don't override with Unknown in the case that we have already set the value to True or False and everything already matches

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 21, 2025
@jigisha620 jigisha620 force-pushed the degraded-nodepool-implementation branch from 810ad69 to 30a813f Compare February 26, 2025 20:31
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 26, 2025
@jigisha620 jigisha620 force-pushed the degraded-nodepool-implementation branch from 30a813f to deab154 Compare February 26, 2025 21:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants