Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Safely recreate google_container_node_pool #10895

Closed
kwri-avongluck opened this issue Jan 12, 2022 · 3 comments
Closed

Safely recreate google_container_node_pool #10895

kwri-avongluck opened this issue Jan 12, 2022 · 3 comments
Labels
enhancement forward/review In review; remove label to forward service/container

Comments

@kwri-avongluck
Copy link

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment. If the issue is assigned to the "modular-magician" user, it is either in the process of being autogenerated, or is planned to be autogenerated soon. If the issue is assigned to a user, that user is claiming responsibility for the issue. If the issue is assigned to "hashibot", a community member has claimed the issue already.

Description

Making adjustments to the node pool (ex: preemptible to non-preemptable) will result in the node pool getting erased and recreated.

The way Terraform undertakes this, the workloads will always be interrupted.
It would be possible to perform this adjustment without causing interruption by:

  • Create new node pool with a differing name
  • Wait for workloads to shift
  • Destroy old node pool

The above process could happen if the end user adjusted the name of the node pool based on a preemptible variable.
The only remaining reason this won't work is terraform-provider-google doesn't prioritize pool creations over deletions.

In addition to the outage above, i've seen terraform destroy the node pool and not recreate it when the preemption setting is changed. (leaving the gke cluster with zero node pools)

New or Affected Resource(s)

  • google_container_node_pool

Potential Terraform Configuration

Any basic gke cluster with "delete_initial_node_pool" set to true.

References

https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/container_node_pool

@upodroid
Copy link
Contributor

Hello

Google made google_container_node_pool resource immutable so any changes need to applied to a new nodepool. You can't update an existing nodepool.

You can leverage resource lifecycles and name_prefix to ensure that a new nodepool is created before the old one is destroyed. Changing nodepools is always disruptive so design your kubernetes workloads accordingly.

Thank you

@rileykarson
Copy link
Collaborator

I'd recommend doing this across two applies- create the new pool in an apply, cordon the nodes in Kubernetes, and then delete the now-unused pool. There isn't a great way to expose a better UX in Terraform alone, unfortunately!

@github-actions
Copy link

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Feb 18, 2022
@github-actions github-actions bot added service/container forward/review In review; remove label to forward labels Jan 14, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement forward/review In review; remove label to forward service/container
Projects
None yet
Development

No branches or pull requests

3 participants