- DaemonSet. Installs the CSI driver on every applicable node.
- CSI driver. Receives requests from kubelet to mount or unmount volumes.
- Use https://kubernetes-csi.github.io/ to implement this.
- Volume manager. Pod created to manage a replica of a volume on a node.
- Primary replica. In active use by another pod; manages a volume that's mounted.
- Secondary replica. Owned by a ReplicaSet.
- PersistentVolumeClaim. Has a
spec.storageClassName
that references a StorageClass that we handle. - PersistentVolume. Has a
spec.csi.volumeAttributes
object that we use to store:replicationFactor
. Desired number of replicas, including the primary.replicas
. List of replicas of this filesystem. Each item has these attributes:node
. The name of the node where the replica is.snapshot
. The most recent snapshot in the replica.active
. True if this is a mounted primary with read/write access.
- ReplicaSet. Owns secondaries for each volume.
- PVC object gets created in the Kubernetes apiserver. PVC has a
spec.storageClassName
that references a StorageClass that we handle. - Each daemon gets notified because it's watching for PVCs.
- The DaemonSet leader creates a PersistentVolume and associates it with the PVC. Sets
PersistentVolume...replicationFactor
=StorageClass.parameters.defaultReplicationFactor
. - The DaemonSet leader creates a ReplicaSet with
ReplicaSet.spec.replicas
=PersistentVolume...replicationFactor
. - It also creates a PodDisruptionBudget.
- Daemon starts watching for PVCs and PVs.
- Daemon installs the CSI driver.
- Daemon scans ZFS volumes for user properties indicating they belong to StorageClasses we're responsible for. For each one, it ensures a volume manager pod is running with metadata forcing it on this node and belonging to the appropriate ReplicaSet.
- It reads metadata (user properties) from the local volume, if any.
- It creates/updates an entry on the PV with its snapshot number, which is 0 if no volume exists.
- It watches the PV. If any replicas have newer snapshots, it requests them.
- If our snapshot number is close to the largest known snapshot number, it signals it's ready.
- Kubernetes decides which node to schedule the pod on.
- Kubelet calls our CSI driver to mount the volume.
- CSI driver forwards the request to the local volume manager pod.
- Volume manager:
- Modifies its metadata (labels and ownerReferences) so it no longer belongs to the ReplicaSet.
- Sets
ReplicaSet.spec.replicas
=PersistentVolume...replicationFactor - 1
. - Retrieves the latest snapshot data if necessary.
- Marks itself as
active
on the PV's replica list, if the mount is read/write. - Returns mount info to the CSI driver.
- CSI driver returns mount info to kubelet.
- Kubelet calls our CSI driver to unmount the volume.
- CSI driver forwards the request to the local volume manager pod.
- Volume manager:
- Sets
ReplicaSet.spec.replicas
=PersistentVolume...replicationFactor
. - Modifies its metadata (labels and ownerReferences) so it belongs to the ReplicaSet.
- Performs a snapshot and updates the PV replica list.
- Returns unmount success to the CSI driver.
- Sets
- CSI driver returns unmount success to kubelet.
- Volume manager receives SIGTERM.
- It checks the list of replicas on the PV to see if it can delete the local volume.
- If the PV doesn't exist.
- If at least
volumeReplicas
healthy replicas exist. "Healthy" means they have a snapshot at least as new as ours.
- It removes itself from the PV's replica list.
- It removes its entry.
- It re-reads the list of entries.
- It verifies that there are still sufficient replicas.
- It deletes the local volume.