Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MΜ 26623] Migrate to single NGINX deployment, enable autoscaling. #263

Merged
merged 9 commits into from
Jul 27, 2020

Conversation

stylianosrigas
Copy link
Contributor

@stylianosrigas stylianosrigas commented Jul 14, 2020

Summary

This PR introduces a couple of NGINX changes:

  • Start using k8s official NGINX helm chart.
  • Enable autoscaling, set limits and requests for NGINX ingress controller.
  • Migrate from using separate private and public NGINX deployments to a single deployment that creates both services (new feature of the Helm chart).
  • Enable use of TLS certificates for internal Load Balancers (Prometheus traffic).

Steps to reprovision existing clusters and to migrate installation to use new NGINX:

  • Reprovision the cluster by running cloud cluster provision --cluster <cluster_id> --nginx-version 2.11.0.
  • Check that new nginx deployed both internal and public Load Balancers (nginx-ingress-nginx-controller-internal and nginx-ingress-nginx-controller).
  • Manually update Prometheus Route53 record to use the new private Load Balancer (nginx-ingress-nginx-controller-internal).
  • Manually update cluster installations Route53 records one by one to use the new public Load balancer (nginx-ingress-nginx-controller).
    • Update clusterinstallation ingress annotation to use Nginx class nginx-controller instead of nginx.
    • Manually update network policy to target nginx instead of public-nginx.
  • Confirm that all services are up and running.
  • Delete old NGINX helm charts.
    • helm del --purge public-nginx
    • helm del --purge private-nginx

Ticket Link

Release Note

- Start using k8s official NGINX helm [chart](/~https://github.com/kubernetes/ingress-nginx/tree/master/charts/ingress-nginx).
- Enable autoscaling, set limits and requests for NGINX ingress controller.
- Migrate from using separate private and public NGINX deployments to a single deployment that creates both services.
- Enable use of TLS certificates for internal Load Balancers (Prometheus traffic).

Folllow deprecation instructions in the README.md for cluster reprovisioning steps.

@stylianosrigas stylianosrigas added 2: Dev Review Requires review by a developer kind/feature Categorizes issue or PR as related to a new feature. labels Jul 14, 2020
@mm-cloud-bot mm-cloud-bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. labels Jul 14, 2020
Copy link
Contributor

@cpanato cpanato left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all the rest lgtm just a small comment

helm-charts/nginx_values.yaml Outdated Show resolved Hide resolved
Copy link
Contributor

@gabrieljackson gabrieljackson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great! It will be nice to have autoscaling and one less utility group to manage.

Other than the one nit I pointed out and Carlos' comment, I am good with this with a few comments that should probably become follow-up tickets for the future. The amount of manual steps involved here is not something we will be able to get away with once we are at scale. As such, I see the following as stuff we may want:

  • The ability to re-reconcile installations. If we change the default values of how we deploy something, such as changing an ingress tag, we should have the ability to push that out to all installations somehow, or have a service that can go through them all to ensure they are correct.
  • The ability to automatically migrate the correct settings for a utility group. It is probably not worth doing here, but we will eventually have to code out the full migration for updating all the resources associated with upgrades like this. It seems really easy to potentially miss something like updating the ingress on one of the cluster installations once we have more than a handful of these.

I guess I am pointing these things out with the idea that this is a chance to practice doing some amount of the automation above. It will take more time, but is something I think we will have to get used to. I don't think it's a requirement this time, but wanted to point out the opportunity.

Looking for input from the other reviewers on this as well.

internal/provisioner/nginx.go Outdated Show resolved Hide resolved
@jwilander jwilander removed their request for review July 24, 2020 13:18
@stylianosrigas stylianosrigas requested a review from cpanato July 25, 2020 19:10
@cpanato cpanato merged commit 53c71c0 into master Jul 27, 2020
@cpanato cpanato deleted the MM-26623 branch July 27, 2020 07:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2: Dev Review Requires review by a developer kind/feature Categorizes issue or PR as related to a new feature. release-note Denotes a PR that will be considered when it comes time to generate release notes.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants