[MΜ 26623] Migrate to single NGINX deployment, enable autoscaling. #263

stylianosrigas · 2020-07-14T12:45:23Z

Summary

This PR introduces a couple of NGINX changes:

Start using k8s official NGINX helm chart.
Enable autoscaling, set limits and requests for NGINX ingress controller.
Migrate from using separate private and public NGINX deployments to a single deployment that creates both services (new feature of the Helm chart).
Enable use of TLS certificates for internal Load Balancers (Prometheus traffic).

Steps to reprovision existing clusters and to migrate installation to use new NGINX:

Reprovision the cluster by running cloud cluster provision --cluster <cluster_id> --nginx-version 2.11.0.
Check that new nginx deployed both internal and public Load Balancers (nginx-ingress-nginx-controller-internal and nginx-ingress-nginx-controller).
Manually update Prometheus Route53 record to use the new private Load Balancer (nginx-ingress-nginx-controller-internal).
Manually update cluster installations Route53 records one by one to use the new public Load balancer (nginx-ingress-nginx-controller).
- Update clusterinstallation ingress annotation to use Nginx class nginx-controller instead of nginx.
- Manually update network policy to target nginx instead of public-nginx.
Confirm that all services are up and running.
Delete old NGINX helm charts.
- helm del --purge public-nginx
- helm del --purge private-nginx

Ticket Link

Release Note

- Start using k8s official NGINX helm [chart](/~https://github.com/kubernetes/ingress-nginx/tree/master/charts/ingress-nginx).
- Enable autoscaling, set limits and requests for NGINX ingress controller.
- Migrate from using separate private and public NGINX deployments to a single deployment that creates both services.
- Enable use of TLS certificates for internal Load Balancers (Prometheus traffic).

Folllow deprecation instructions in the README.md for cluster reprovisioning steps.

…autoscaling for public Nginx

…nal and external services

… migration downtime

…sterinstallations

cpanato

all the rest lgtm just a small comment

helm-charts/nginx_values.yaml

gabrieljackson

This is great! It will be nice to have autoscaling and one less utility group to manage.

Other than the one nit I pointed out and Carlos' comment, I am good with this with a few comments that should probably become follow-up tickets for the future. The amount of manual steps involved here is not something we will be able to get away with once we are at scale. As such, I see the following as stuff we may want:

The ability to re-reconcile installations. If we change the default values of how we deploy something, such as changing an ingress tag, we should have the ability to push that out to all installations somehow, or have a service that can go through them all to ensure they are correct.
The ability to automatically migrate the correct settings for a utility group. It is probably not worth doing here, but we will eventually have to code out the full migration for updating all the resources associated with upgrades like this. It seems really easy to potentially miss something like updating the ingress on one of the cluster installations once we have more than a handful of these.

I guess I am pointing these things out with the idea that this is a chance to practice doing some amount of the automation above. It will take more time, but is something I think we will have to get used to. I don't think it's a requirement this time, but wanted to point out the opportunity.

Looking for input from the other reviewers on this as well.

internal/provisioner/nginx.go

…MM-26623

stylianosrigas added 6 commits July 10, 2020 17:38

MM-26623 Start using nginx k8s Helm repo instead of stable, enabling …

970debd

…autoscaling for public Nginx

MM-26623 Remove unused quotes from ACM certificate arn

a947fa2

MM-26623 Enable NGINX metrics

81a38b1

MM-26623 Migrating to use single NGINX helm deployment for both inter…

faa6ffe

…nal and external services

MM-26623 Updating NGINX ingress class to nginx-controller for minimum…

2d8a639

… migration downtime

MM-26623 Updating NGINX ingress class to nginx-controller for new clu…

94bd106

…sterinstallations

stylianosrigas added 2: Dev Review Requires review by a developer kind/feature Categorizes issue or PR as related to a new feature. labels Jul 14, 2020

stylianosrigas requested review from jwilander, gabrieljackson and cpanato July 14, 2020 12:45

mm-cloud-bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. labels Jul 14, 2020

MM-26623 Add reprovisioning steps in the Readme

2c79521

cpanato suggested changes Jul 16, 2020

View reviewed changes

helm-charts/nginx_values.yaml Outdated Show resolved Hide resolved

gabrieljackson approved these changes Jul 23, 2020

View reviewed changes

internal/provisioner/nginx.go Outdated Show resolved Hide resolved

jwilander removed their request for review July 24, 2020 13:18

stylianosrigas added 2 commits July 25, 2020 21:58

Merge branch 'master' of github.com:mattermost/mattermost-cloud into …

090f310

…MM-26623

MM-26623 Fixing typo and removing image section from helm chart values

5b42563

stylianosrigas requested a review from cpanato July 25, 2020 19:10

cpanato approved these changes Jul 27, 2020

View reviewed changes

cpanato merged commit 53c71c0 into master Jul 27, 2020

cpanato deleted the MM-26623 branch July 27, 2020 07:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MΜ 26623] Migrate to single NGINX deployment, enable autoscaling. #263

[MΜ 26623] Migrate to single NGINX deployment, enable autoscaling. #263

stylianosrigas commented Jul 14, 2020 •

edited

Loading

cpanato left a comment

gabrieljackson left a comment

[MΜ 26623] Migrate to single NGINX deployment, enable autoscaling. #263

[MΜ 26623] Migrate to single NGINX deployment, enable autoscaling. #263

Conversation

stylianosrigas commented Jul 14, 2020 • edited Loading

Summary

Steps to reprovision existing clusters and to migrate installation to use new NGINX:

Ticket Link

Release Note

cpanato left a comment

Choose a reason for hiding this comment

gabrieljackson left a comment

Choose a reason for hiding this comment

stylianosrigas commented Jul 14, 2020 •

edited

Loading