Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kops rolling-update should do a real rolling-update #37

Closed
justinsb opened this issue Jul 5, 2016 · 13 comments
Closed

kops rolling-update should do a real rolling-update #37

justinsb opened this issue Jul 5, 2016 · 13 comments
Assignees
Milestone

Comments

@justinsb
Copy link
Member

justinsb commented Jul 5, 2016

Right now it is a hacky timing-based loop.

Ideally we would wait for the new node to be registered.

@justinsb justinsb added the P1 label Jul 5, 2016
@justinsb
Copy link
Member Author

We should also evict nodes before terminating them.

@gigaroby
Copy link

gigaroby commented Jul 18, 2016

What about using a similar strategy to that followed by pod rolling updates in kubernetes?
There could a parameter that controls how many nodes are affected at any given time (let's call it N) and the strategy could be something like:

  1. increase the size of the instance group by N
  2. wait for the new N nodes to register themselves to the master
  3. cordon and drain N old nodes, then delete them from the instance group
  4. wait for the new instances to be available on the master
  5. repeat from step 3 until there are no more old instances
  6. decrease the size of the instance group back by N again

I am not an expert on how ASGs work on EC2 but doing this naively may have some problems, namely that it could be slow to wait for the ASG to realize that N nodes have been deleted and to spawn new ones and also the behavior of the last step may be a bit problematic.

@justinsb
Copy link
Member Author

justinsb commented Aug 2, 2016

Note: we should also rolling-update the masters before the nodes, in the case of an version upgrade.

@chrislovecnm
Copy link
Contributor

This is handled much better in my open PR.

@chrislovecnm chrislovecnm self-assigned this Dec 19, 2016
@krisnova
Copy link
Contributor

@chrislovecnm - pointer please? Which PR?

@chrislovecnm
Copy link
Contributor

#1134 Drain and validate in rolling update

@justinsb justinsb modified the milestones: 1.5.0, 1.5 Dec 28, 2016
@justinsb justinsb modified the milestones: 1.5.1, 1.5.0 Jan 29, 2017
@melv-n
Copy link

melv-n commented Feb 16, 2017

Is this is expected in 1.5.1?

@chrislovecnm
Copy link
Contributor

@mirague I am working on the PR again today, hopefully, our next release with a feature flag to turn it on.

@melv-n
Copy link

melv-n commented Feb 17, 2017

That's fantastic to hear, thanks @chrislovecnm !

@toidiu
Copy link

toidiu commented Apr 10, 2017

any update on this?

@Miyurz
Copy link

Miyurz commented Apr 20, 2017

@chrislovecnm any updates here ?

@chrislovecnm
Copy link
Contributor

@Miyurz we have had a feature flag in that allows for drain and validate. It is stable, especially for stateless applications. We need some more TLC for stateless applications.

Use KOPS_FEATURE_FLAGS="+DrainAndValidateRollingUpdate" to use beta code that drains the nodes and validates the cluster. New flags for Drain and Validation operations will be shown when
the environment variable is set.

@chrislovecnm
Copy link
Contributor

Closing is this is stable, and we have #1718

cloudbow pushed a commit to cloudbow/kops that referenced this issue Jun 8, 2018
…re/nba_full to develop

* commit '0c6bd01682f69a91c55fb14cd872c83accc8c299':
  Refactor live job to abstract class and implementations
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants