From 70bdc9ec941a6d5a327c9bef07417fbd6b55319b Mon Sep 17 00:00:00 2001 From: Dalton Hubble Date: Sat, 28 Mar 2020 17:49:17 -0700 Subject: [PATCH] Allow bootstrap re-apply for Fedora CoreOS GCP * Problem: Fedora CoreOS images are manually uploaded to GCP. When a cluster is created with a stale image, Zincati immediately checks for the latest stable image, fetches, and reboots. In practice, this can unfortunately occur exactly during the initial cluster bootstrap phase. * Recommended: Upload the latest Fedora CoreOS image regularly * Mitigation: Allow a failed bootstrap.service run (which won't touch the done ConditionalPathExists) to be re-run by running `terraforma apply` again. Add a known issue to CHANGES * Update docs to show the current Fedora CoreOS stable version to reduce likelihood users see this issue Longer term ideas: * Ideal: Fedora CoreOS publishes a stable channel. Instances will always boot with the latest image in a channel. The problem disappears since it works the same way AWS does * Timer: Consider some timer-based approach to have zincati delay any system reboots for the first ~30 min of a machine's life. Possibly just configured on the controller node /~https://github.com/coreos/zincati/pull/251 * External coordination: For Container Linux, locksmith filled a similar role and was disabled to allow CLUO to coordinate reboots. By running atop Kubernetes, it was not possible for the reboot to occur before cluster bootstrap * Rely on /~https://github.com/coreos/zincati/issues/115 to delay the reboot since bootstrap involves an SSH session * Use path-based activation of zincati on controllers and set that path at the end of the bootstrap process Rel: /~https://github.com/coreos/fedora-coreos-tracker/issues/239 --- CHANGES.md | 4 ++++ docs/fedora-coreos/google-cloud.md | 9 +++------ .../fedora-coreos/kubernetes/fcc/controller.yaml | 1 + 3 files changed, 8 insertions(+), 6 deletions(-) diff --git a/CHANGES.md b/CHANGES.md index 98a6df2aa..425f2db84 100644 --- a/CHANGES.md +++ b/CHANGES.md @@ -33,6 +33,10 @@ Notable changes between versions. * Update default `os_stream` from testing to stable +#### Google Cloud + +* Known: Use of stale Fedora CoreOS image may require terraform re-apply during bootstrap ([#687](/~https://github.com/poseidon/typhoon/pull/687)) + #### DigitalOcean * Rename `image` variable to `os_image` for consistency ([#677](/~https://github.com/poseidon/typhoon/pull/677)) (action required) diff --git a/docs/fedora-coreos/google-cloud.md b/docs/fedora-coreos/google-cloud.md index 1f0bfd7ad..b20e6614e 100644 --- a/docs/fedora-coreos/google-cloud.md +++ b/docs/fedora-coreos/google-cloud.md @@ -1,8 +1,5 @@ # Google Cloud -!!! danger - Typhoon for Fedora CoreOS is an alpha. Please report Fedora CoreOS bugs to [Fedora](/~https://github.com/coreos/fedora-coreos-tracker/issues) and Typhoon issues to Typhoon. - In this tutorial, we'll create a Kubernetes v1.18.0 cluster on Google Compute Engine with Fedora CoreOS. We'll declare a Kubernetes cluster using the Typhoon Terraform module. Then apply the changes to create a network, firewall rules, health checks, controller instances, worker managed instance group, load balancers, and TLS assets. @@ -76,13 +73,13 @@ Fedora CoreOS publishes images for Google Cloud, but does not yet upload them. G ``` gsutil list -gsutil cp fedora-coreos-31.20200113.3.1-gcp.x86_64.tar.gz gs://BUCKET +gsutil cp fedora-coreos-31.20200310.3.0-gcp.x86_64.tar.gz gs://BUCKET ``` Create a Compute Engine image from the file. ``` -gcloud compute images create fedora-coreos-31-20200113-3-1 --source-uri gs://BUCKET/fedora-coreos-31.20200113.3.1-gcp.x86_64.tar.gz +gcloud compute images create fedora-coreos-31-20200310-3-0 --source-uri gs://BUCKET/fedora-coreos-31.20200310.3.0-gcp.x86_64.tar.gz ``` ## Cluster @@ -100,7 +97,7 @@ module "yavin" { dns_zone_name = "example-zone" # custom image name from above - os_image = "fedora-coreos-31-20200113-3-1" + os_image = "fedora-coreos-31-20200310-3-0" # configuration ssh_authorized_key = "ssh-rsa AAAAB3Nz..." diff --git a/google-cloud/fedora-coreos/kubernetes/fcc/controller.yaml b/google-cloud/fedora-coreos/kubernetes/fcc/controller.yaml index 4603e4f71..85a75abda 100644 --- a/google-cloud/fedora-coreos/kubernetes/fcc/controller.yaml +++ b/google-cloud/fedora-coreos/kubernetes/fcc/controller.yaml @@ -116,6 +116,7 @@ systemd: Type=oneshot RemainAfterExit=true WorkingDirectory=/opt/bootstrap + ExecStartPre=-/usr/bin/podman rm bootstrap ExecStart=/usr/bin/podman run --name bootstrap \ --network host \ --volume /etc/kubernetes/bootstrap-secrets:/etc/kubernetes/secrets:ro,Z \