This is the best way to run Jenkins with auto-spawning agents for scaling up and down.
- Kubernetes Configs
- Jenkins on Kubernetes Diagram
- CloudBees on Kubernetes
- Jenkins X
- Old Manual Configuration of Jenkins on Kubernetes
- Increase Jenkins Server Disk Space on Kubernetes
- Troubleshooting
Configs are in the jenkins/
directory.
https://plugins.jenkins.io/configuration-as-code/
JCasC updates the password dynamically from the values.yaml
(or per environment jcasc patch as per in the repo) using
a sidecar that dynamically notices and reloads the config.
You're probably going to need a bigger node pool for the Jenkins server which can only vertically scale
and will otherwise get stuck in a pending
state when you increase the resource requests/limits and -Xmx
.
Create a small pool with 16GB nodes because Jenkins server frequently scales past 6GB and doesn't schedule on
e2-standard-2
(8GB RAM) so use e2-standard-4
(16GB RAM):
gcloud beta container node-pools create "jenkins" \
--cluster "$CLOUDSDK_CONTAINER_CLUSTER" \
--machine-type "e2-standard-4" \
--num-nodes "1" \
--enable-autoscaling \
--min-nodes "0" \
--max-nodes "1" \
--location-policy "BALANCED" \
--enable-autoupgrade \
--enable-autorepair \
--max-surge-upgrade 1 \
--max-unavailable-upgrade 0
User (usually admin
):
kubectl get secret -n jenkins jenkins -o 'jsonpath={.data.jenkins-admin-user}' | base64 --decode
Password:
kubectl get secret -n jenkins jenkins -o jsonpath="{.data.jenkins-admin-password}" | base64 --decode
WARNING: The Jenkins admin password secret gets changed to a new random value every time you apply the Jenkins Helm chart via Kustomize (see bug report).
You can also get the secrets from the container, it's just a bit longer, but it's exactly the same as the above and has the same bug:
kubectl exec -ti -n jenkins jenkins-0 -c jenkins -- cat /run/secrets/additional/chart-admin-user
kubectl exec -ti -n jenkins jenkins-0 -c jenkins -- cat /run/secrets/additional/chart-admin-password
Whether you lost the password or got hit by this bug, the
traditional reset
won't work with the Jenkins helm chart if securityRealm
-> local
section is set because JCasC resets
the admin password from the jenkins
secret whenever it changes, which will hit you every pod restart.
Stop the helm chart from recreating the secret every time by setting this in the chart values.yaml
and applying it:
controller:
admin:
existingSecret: jenkins
Then force the jenkins-0
server pod to restart:
kubectl rollout restart sts jenkins
Now recover the initial admin password to log in:
kubectl get secret -n jenkins jenkins -o jsonpath="{.data.jenkins-admin-password}" | base64 --decode
If you've set existingSecret
before any initial deployment so that the jenkins
secret is never create or you lost
the secret that JCasC uses (perhaps because ArgoCD pruned it) you can (re)create it like this:
kubectl create secret generic -n jenkins jenkins \
--from-literal=jenkins-admin-user="admin" \
--from-literal=jenkins-admin-password="$(pwgen -s 20 -c 1)"
then restart Jenkins again to force JCasC to reset Jenkins to this new password and flush any caches:
kubectl rollout restart sts jenkins
This is an example of a production Jenkins-on-Kubernetes I built and managed for a client.
Deploy this config:
HariSekhon/Kubernetes-configs - cloudbees/ directory
install/install_cloudbees.sh
cloudbees check kubernetes
gets this output, even on production clusters with several working ingresses
[KO] Ingress service exists
output from a live EKS cluster:
[OK] Kubernetes Client version is higher or equal to 1.10 - v1.18.8
[OK] Kubernetes client can be created
[OK] Kubernetes server is accessible
[OK] Kubernetes Server version is higher or equal to 1.10 - v1.21.2-eks-0389ca3
[OK] Client and server have same major version - 1
[KO] Client and server have less than 1 minor version difference - 1.18.8 vs 1.21.2-eks-0389ca3. Fix your PATH to use a compatible kubectl client.
[KO] Ingress service exists
[SKIPPED] Ingress has an external address
[SKIPPED] Ingress deployment exists
[SKIPPED] Ingress deployment has at least 1 ready replica
[SKIPPED] Host name resolves to ingress external address - Add --host-name='mycompany.com' to enable this check
[SKIPPED] Can access the ingress controller using http
[SKIPPED] Can access the ingress controller using https
[OK] Has a default storage class - gp2
[OK] Storage provisioner is supported - gp2
----------------------------------------------
Summary: 15 run, 7 passed, 2 failed, 6 skipped
error: there are 2 failed checks
output from a live GKE cluster:
[OK] Kubernetes Client version is higher or equal to 1.10 - v1.18.8
[OK] Kubernetes client can be created
[OK] Kubernetes server is accessible
[OK] Kubernetes Server version is higher or equal to 1.10 - v1.19.14-gke.1900
[OK] Client and server have same major version - 1
[KO] Client and server have less than 1 minor version difference - 1.18.8 vs 1.19.14-gke.1900. Fix your PATH to use a compatible kubectl client.
[OK] Ingress service exists - kube-system/jxing-nginx-ingress-controller
[OK] Ingress has an external address - 35.189.220.163
[SKIPPED] Ingress deployment has at least 1 ready replica
[SKIPPED] Host name resolves to ingress external address - Add --host-name='mycompany.com' to enable this check
[KO] Can access the ingress controller using http - Accessing http://x.x.x.x/ won't work : Get "http://x.x.x.x/": dial tcp x.x.x.x:80: connect: connection refused
[KO] Can access the ingress controller using https - Accessing https://x.x.x.x/ won't work : Get "https://x.x.x.x/": dial tcp x.x.x.x:443: connect: connection refused
[OK] Has a default storage class - standard
[KO] Storage provisioner is supported - Please create a storage class using the disk type 'pd-ssd'
Tries to bundle everything - nice in theory, but this increases complexity and reduces flexibility.
This isn't really Jenkins - it uses Tekton pipelines.
Normal Jenkins-on-Kubernetes above is easier, works better as more widely used and tested, and is more compatible with the traditional Jenkins people have been using for over a decade, including all the plugins and features.
helm init --stable-repo-url https://charts.helm.sh/stable
Workaround for broken helm init url in older version of helm bundled is here.
jx boot
jx status
In the UI click:
Manage Jenkins
-> Manage Nodes and Clouds
-> Configure Clouds
-> add Kubernetes
Settings:
Credentials -> add -> Jenkins -> GCP service account
Jenkins URL -> http://jenkins-ui.jenkins.svc.cluster.local:8080
Jenkins tunnel -> jenkins-discovery.jenkins.svc.cluster.local:50000
Pod Templates
-> Add Pod Template
-> copy pod template from k8s repo jenkins agent-pod.yaml (/~https://github.com/HariSekhon/Kubernetes-configs/blob/master/jenkins/base/agent-pod.yaml)
-> Usage -> "Use this node as much as possible" (default: "Only build jobs with label expressions matching this node" <- won't get used)
The tricks is doing this without losing your job history data.
You first need to have been using a resizeable disk configuration.
WARNING: do NOT delete the PersistentVolumeClaim
Otherwise the Jenkins server statefulset will create a new blank persistent volume, losing your state. Then you'll have to follow the more difficult Recovery Steps further down.
Ensure Persistent Volume will be Retained
First, ensure that the Jenkins persistent volume is set to be retained so you don’t lose the /var/jenkins_home
volume
where the build history is stored (losing it will also fail future pipelines milestones due to build numbers being reset
which requires
Groovy Console Scripting
to fix it).
Find the persistent volume:
kubectl get pv | grep jenkins
You should see Retain
in the 4th field rather than Delete
(delete is the default):
pvc-228857c7-2230-407f-80be-6ff3c4f0b946 30Gi RWO Retain Bound jenkins/jenkins-home-jenkins-0 gcp-standard-resizeable 2y161d
If it's not set to Retain
then edit it:
kubectl edit -n jenkins \
"$(kubectl get pvc -n jenkins | \
grep jenkins-home | \
awk 'print $1; exit')"
Increase the storage size request in the persistent volume claim template in the jenkins server.yaml or values.yaml if using Helm install.
Merge and apply the Pull Request with the resized storage request in the values.yaml
if using helm chart
or server.yaml
if using older custom manifest install.
Delete the Jenkins statefulset (needed because K8s doesn’t allow updating the storage field):
kubectl delete sts -n jenkins jenkins
Update the Jenkins persistent volume claim (WARNING: don’t delete / recreate or it’ll create a new blank volume):
kubectl edit pvc -n jenkins jenkins-home-jenkins-0
Check the pvc has the new size:
kubectl get pvc -n jenkins
Check the persistent volume was automatically resized by the cloud provider to meet the new pvc request size:
kubectl get pv | grep jenkins
Redeploy Jenkins via ArgoCD / Helm / Kustomize / kubectl
to recreate the server statefulset with the new size.
Check the Jenkins server is running again:
kubectl get po -n jenkins jenkins-0
If you delete the PVC, then it'll create a new blank persistent volume, losing your state.
If you have set your PersistentVolume to Retain
(which should have been the first thing you did when you installed Jenkins)
and at the very least before you started this operation, then you can recover this, but it's tricky.
To recover, edit the PVC and replace the volumeName
field with the old volume name.
Unfortunately you can't edit in place so dump to yaml and edit.
kubectl get -n jenkins pvc/jenkins-home-jenkins-0 -o yaml > /tmp/jenkins-pvc.yaml
Edit it:
vim /tmp/jenkins-pvc.yaml
Delete the PVC, which hangs, on the STS. If using ArgoCD it's recreated almost instantly, so have to race to recreate the PVC instantly, chain the commands and execute them all in 1 second:
kubectl delete pvc jenkins-home-jenkins-server-0 &
kubectl delete sts jenkins-server
kubectl create -f /tmp/jenkins-pvc.yaml
kubectl get pv,pvc -n jenkins | grep jenkins
You need to fix this Lost
:
persistentvolumeclaim/jenkins-home-jenkins-0 Lost persistentvolume/pvc-7fecff9a-7ec7-42a4-bc4f-b54286f089c2 0 gcp-standard-resizeable 61s
Now allow the old PV to be claimed by the new PVC uid.
Get the UID of the claiming PVC:
kubectl get -o json persistentvolumeclaim/jenkins-home-jenkins-server-0 | jq -r '.metadata.uid'
WARNING: DON'T EDIT THE PV UID IT'LL GET LOST
kubectl edit pv ... # increase the size of the pv
kubectl delete pvc jenkins-home-jenkins-0 &
kubectl delete sts jenkins-server
The new pvc will now bind to the original PV.
Then redeploy the statefulset again (automatic via ArgoCD) or via Helm / Kustomize / kubectl
to use this new PVC ponting to the original PV with your job history data.
Ported from private Knowledge Base page 2020+