Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SURE-9285] eks-config-operator crashing (go panic) on eks.CreateNodeGroup #986

Closed
4 tasks
kkaempf opened this issue Dec 2, 2024 · 0 comments
Closed
4 tasks
Assignees
Labels
JIRA Must shout kind/bug Something isn't working

Comments

@kkaempf
Copy link

kkaempf commented Dec 2, 2024

SURE-9285

Issue description:

The Hosted Rancher customer noticed a problem when trying to update the node groups on one of their downstream EKS clusters. We see in the upstream Rancher cluster that the eks-config-operator pod is crashing due to a go panic:

time="2024-10-29T23:16:08Z" level=info msg="Starting eks.cattle.io/v1, Kind=EKSClusterConfig controller"
time="2024-10-29T23:16:08Z" level=info msg="Starting /v1, Kind=Secret controller"
E1029 23:16:09.738753       9 runtime.go:79] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)
goroutine 101 [running]:
k8s.io/apimachinery/pkg/util/runtime.logPanic({0x2e8c7e0, 0x4daaed0})
	/home/runner/go/pkg/mod/k8s.io/apimachinery@v0.30.1/pkg/util/runtime/runtime.go:75 +0x85
k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0xc001c1d6e0?})
	/home/runner/go/pkg/mod/k8s.io/apimachinery@v0.30.1/pkg/util/runtime/runtime.go:49 +0x6b
panic({0x2e8c7e0?, 0x4daaed0?})
	/home/runner/go/pkg/mod/golang.org/toolchain@v0.0.1-go1.22.7.linux-amd64/src/runtime/panic.go:770 +0x132
github.com/rancher/eks-operator/pkg/eks.CreateNodeGroup({0x3a95118, 0xc0004fe000}, 0xc000d8efe0)
	/home/runner/work/eks-operator/eks-operator/pkg/eks/create.go:326 +0xa37
github.com/rancher/eks-operator/controller.(*Handler).updateUpstreamClusterState(0xc000143f40, {0x3a95118, 0xc0004fe000}, 0xc0008d01e0, 0xc0007ceb08, 0xc001e6ec00, {0xc00137b500, 0x3d}, 0xc000d8f840)
	/home/runner/work/eks-operator/eks-operator/controller/eks-cluster-config-handler.go:832 +0x13fa
github.com/rancher/eks-operator/controller.(*Handler).checkAndUpdate(0xc000143f40, {0x3a95118, 0xc0004fe000}, 0xc0007ceb08, 0xc001e6ec00)
	/home/runner/work/eks-operator/eks-operator/controller/eks-cluster-config-handler.go:319 +0xc12
github.com/rancher/eks-operator/controller.(*Handler).OnEksConfigChanged(0xc000143f40, {0x0?, 0x0?}, 0xc0007ceb08)
	/home/runner/work/eks-operator/eks-operator/controller/eks-cluster-config-handler.go:100 +0x2bc
github.com/rancher/eks-operator/controller.Register.(*Handler).recordError.func1({0xc000810a00?, 0x1a?}, 0xc0007ceb08?)
	/home/runner/work/eks-operator/eks-operator/controller/eks-cluster-config-handler.go:112 +0x37
github.com/rancher/wrangler/v3/pkg/generic.(*Controller[...].func1({0x3a7e7e8?, 0xc0007ceb08?})
	/home/runner/go/pkg/mod/github.com/rancher/wrangler/v3@v3.0.0/pkg/generic/controller.go:169 +0x44
github.com/rancher/lasso/pkg/controller.SharedControllerHandlerFunc.OnChange(0x0?, {0xc000810a00?, 0x0?}, {0x3a7e7e8?, 0xc0007ceb08?})
	/home/runner/go/pkg/mod/github.com/rancher/lasso@v0.0.0-20240705194423-b2a060d103c1/pkg/controller/sharedcontroller.go:29 +0x32
github.com/rancher/lasso/pkg/controller.(*SharedHandler).OnChange(0xc000726960, {0xc000810a00, 0x1a}, {0x3a7e7e8, 0xc0007ceb08})
	/home/runner/go/pkg/mod/github.com/rancher/lasso@v0.0.0-20240705194423-b2a060d103c1/pkg/controller/sharedhandler.go:75 +0x202
github.com/rancher/lasso/pkg/controller.(*controller).syncHandler(0xc0000d8a50, {0xc000810a00, 0x1a})
	/home/runner/go/pkg/mod/github.com/rancher/lasso@v0.0.0-20240705194423-b2a060d103c1/pkg/controller/controller.go:236 +0x12e
github.com/rancher/lasso/pkg/controller.(*controller).processSingleItem(0xc0000d8a50, {0x2d82c20, 0xc001c1d6e0})
	/home/runner/go/pkg/mod/github.com/rancher/lasso@v0.0.0-20240705194423-b2a060d103c1/pkg/controller/controller.go:217 +0xeb
github.com/rancher/lasso/pkg/controller.(*controller).processNextWorkItem(0xc0000d8a50)
	/home/runner/go/pkg/mod/github.com/rancher/lasso@v0.0.0-20240705194423-b2a060d103c1/pkg/controller/controller.go:194 +0x45
github.com/rancher/lasso/pkg/controller.(*controller).runWorker(...)
	/home/runner/go/pkg/mod/github.com/rancher/lasso@v0.0.0-20240705194423-b2a060d103c1/pkg/controller/controller.go:183
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x30?)
	/home/runner/go/pkg/mod/k8s.io/apimachinery@v0.30.1/pkg/util/wait/backoff.go:226 +0x33
k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0xc001c1d7c0, {0x3a5a7c0, 0xc002000000}, 0x1, 0xc000115a40)
	/home/runner/go/pkg/mod/k8s.io/apimachinery@v0.30.1/pkg/util/wait/backoff.go:227 +0xaf
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc001c1d7c0, 0x3b9aca00, 0x0, 0x1, 0xc000115a40)
	/home/runner/go/pkg/mod/k8s.io/apimachinery@v0.30.1/pkg/util/wait/backoff.go:204 +0x7f
k8s.io/apimachinery/pkg/util/wait.Until(...)
	/home/runner/go/pkg/mod/k8s.io/apimachinery@v0.30.1/pkg/util/wait/backoff.go:161
created by github.com/rancher/lasso/pkg/controller.(*controller).run in goroutine 85
	/home/runner/go/pkg/mod/github.com/rancher/lasso@v0.0.0-20240705194423-b2a060d103c1/pkg/controller/controller.go:151 +0x2ba
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
	panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x2b33917]

goroutine 101 [running]:
k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0xc001c1d6e0?})
	/home/runner/go/pkg/mod/k8s.io/apimachinery@v0.30.1/pkg/util/runtime/runtime.go:56 +0xcd
panic({0x2e8c7e0?, 0x4daaed0?})
	/home/runner/go/pkg/mod/golang.org/toolchain@v0.0.1-go1.22.7.linux-amd64/src/runtime/panic.go:770 +0x132
github.com/rancher/eks-operator/pkg/eks.CreateNodeGroup({0x3a95118, 0xc0004fe000}, 0xc000d8efe0)
	/home/runner/work/eks-operator/eks-operator/pkg/eks/create.go:326 +0xa37
github.com/rancher/eks-operator/controller.(*Handler).updateUpstreamClusterState(0xc000143f40, {0x3a95118, 0xc0004fe000}, 0xc0008d01e0, 0xc0007ceb08, 0xc001e6ec00, {0xc00137b500, 0x3d}, 0xc000d8f840)
	/home/runner/work/eks-operator/eks-operator/controller/eks-cluster-config-handler.go:832 +0x13fa
github.com/rancher/eks-operator/controller.(*Handler).checkAndUpdate(0xc000143f40, {0x3a95118, 0xc0004fe000}, 0xc0007ceb08, 0xc001e6ec00)
	/home/runner/work/eks-operator/eks-operator/controller/eks-cluster-config-handler.go:319 +0xc12
github.com/rancher/eks-operator/controller.(*Handler).OnEksConfigChanged(0xc000143f40, {0x0?, 0x0?}, 0xc0007ceb08)
	/home/runner/work/eks-operator/eks-operator/controller/eks-cluster-config-handler.go:100 +0x2bc
github.com/rancher/eks-operator/controller.Register.(*Handler).recordError.func1({0xc000810a00?, 0x1a?}, 0xc0007ceb08?)
	/home/runner/work/eks-operator/eks-operator/controller/eks-cluster-config-handler.go:112 +0x37
github.com/rancher/wrangler/v3/pkg/generic.(*Controller[...].func1({0x3a7e7e8?, 0xc0007ceb08?})
	/home/runner/go/pkg/mod/github.com/rancher/wrangler/v3@v3.0.0/pkg/generic/controller.go:169 +0x44
github.com/rancher/lasso/pkg/controller.SharedControllerHandlerFunc.OnChange(0x0?, {0xc000810a00?, 0x0?}, {0x3a7e7e8?, 0xc0007ceb08?})
	/home/runner/go/pkg/mod/github.com/rancher/lasso@v0.0.0-20240705194423-b2a060d103c1/pkg/controller/sharedcontroller.go:29 +0x32
github.com/rancher/lasso/pkg/controller.(*SharedHandler).OnChange(0xc000726960, {0xc000810a00, 0x1a}, {0x3a7e7e8, 0xc0007ceb08})
	/home/runner/go/pkg/mod/github.com/rancher/lasso@v0.0.0-20240705194423-b2a060d103c1/pkg/controller/sharedhandler.go:75 +0x202
github.com/rancher/lasso/pkg/controller.(*controller).syncHandler(0xc0000d8a50, {0xc000810a00, 0x1a})
	/home/runner/go/pkg/mod/github.com/rancher/lasso@v0.0.0-20240705194423-b2a060d103c1/pkg/controller/controller.go:236 +0x12e
github.com/rancher/lasso/pkg/controller.(*controller).processSingleItem(0xc0000d8a50, {0x2d82c20, 0xc001c1d6e0})
	/home/runner/go/pkg/mod/github.com/rancher/lasso@v0.0.0-20240705194423-b2a060d103c1/pkg/controller/controller.go:217 +0xeb
github.com/rancher/lasso/pkg/controller.(*controller).processNextWorkItem(0xc0000d8a50)
	/home/runner/go/pkg/mod/github.com/rancher/lasso@v0.0.0-20240705194423-b2a060d103c1/pkg/controller/controller.go:194 +0x45
github.com/rancher/lasso/pkg/controller.(*controller).runWorker(...)
	/home/runner/go/pkg/mod/github.com/rancher/lasso@v0.0.0-20240705194423-b2a060d103c1/pkg/controller/controller.go:183
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x30?)
	/home/runner/go/pkg/mod/k8s.io/apimachinery@v0.30.1/pkg/util/wait/backoff.go:226 +0x33
k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0xc001c1d7c0, {0x3a5a7c0, 0xc002000000}, 0x1, 0xc000115a40)
	/home/runner/go/pkg/mod/k8s.io/apimachinery@v0.30.1/pkg/util/wait/backoff.go:227 +0xaf
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc001c1d7c0, 0x3b9aca00, 0x0, 0x1, 0xc000115a40)
	/home/runner/go/pkg/mod/k8s.io/apimachinery@v0.30.1/pkg/util/wait/backoff.go:204 +0x7f
k8s.io/apimachinery/pkg/util/wait.Until(...)
	/home/runner/go/pkg/mod/k8s.io/apimachinery@v0.30.1/pkg/util/wait/backoff.go:161
created by github.com/rancher/lasso/pkg/controller.(*controller).run in goroutine 85
	/home/runner/go/pkg/mod/github.com/rancher/lasso@v0.0.0-20240705194423-b2a060d103c1/pkg/controller/controller.go:151 +0x2ba

Business impact:

Customer can't access their EKS cluster in Rancher

Troubleshooting steps:

It seems like the problem is coming from here: /~https://github.com/rancher/eks-operator/blob/release-v2.9/pkg/eks/create.go#L326

Is that failing due to it not being able to delete a Launch Template?

We checked the Launch Template for the cluster that's referenced in the ekscc object for the cluster. The LT version it lists exists in AWS.

There were a couple of old ekscc objects that were previously deleted but hanging on finalizers. We cleared those out and it didn't help.

His EKS cluster also had a node group he tried to create as a test that had a space in it. We removed the space from the name and that still didn't help.

Please let us know what else we can look into.

PR's

@kkaempf kkaempf added kind/bug Something isn't working JIRA Must shout labels Dec 2, 2024
@kkaempf kkaempf changed the title [SURE-9285] [SURE-9285] eks-config-operator crashing (go panic) on eks.CreateNodeGroup Dec 3, 2024
@mjura mjura moved this from Backlog to PR to be reviewed in CAPI & Hosted Kubernetes providers (EKS/AKS/GKE) Dec 9, 2024
@mjura mjura closed this as completed Dec 9, 2024
@mjura mjura self-assigned this Dec 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
JIRA Must shout kind/bug Something isn't working
Development

No branches or pull requests

2 participants