Kubernetes graceful shutdown not working as expected #3305
Labels
area/kubernetes
K8s including EKS, EKS-A, and including VMW
status/research
This issue is being researched
type/bug
Something isn't working
tldr: Kubernetes graceful shutdown relies on inhibitor locks. These are part of systemd-logind, which is not currently included as part of Bottlerocket.
Discussed in #3291
Originally posted by carlosjgp July 25, 2023
I've been working on a way to roll out our EKS nodes gracefully and my first attempt was trying to use K8s native support for node graceful shutdown and avoid using any extra infrastructure or deployments
I can see that support for its properties here
I tried setting these to
shutdown_grace_period=3m
andshutdown_grace_period_critical_pod=2m
and swapping the nodes between Bottlerocket version1.14.1
and1.14.2
while running a simple Nginx deployment (helm create test
) with 8 replicas and ingress setup to accept trafficThen run
vegeta
to hit this Nginx deployment to ensure the pods were gracefully moved across and respecting the PodDisruptionBudget but there are a lot of failed requestsThe nodes are supposed to be tainted with
node.kubernetes.io/not-ready:NoSchedule
but I didn't see this happening eitherI'm using:
/aws/service/bottlerocket/aws-k8s-1.27/x86_64/${var.bottlerocket_version}/image_id
Has someone seen this before?
I could provide more details from Cloudwatch, Container or OS logs and K8s events
The text was updated successfully, but these errors were encountered: