Skip to content

Commit

Permalink
Implement MultiClusterCIDR API in flannel
Browse files Browse the repository at this point in the history
This API requires Kubernetes 1.26 and is available for vxlan,
wireguard and hist-gw backends.
  • Loading branch information
thomasferrandiz committed Nov 9, 2022
1 parent 08bfbc9 commit e425a3d
Show file tree
Hide file tree
Showing 29 changed files with 1,719 additions and 104 deletions.
126 changes: 126 additions & 0 deletions Documentation/MultiClusterCIDR/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,126 @@
Flannel provides experimental support for the new [MultiClusterCIDR API](/~https://github.com/kubernetes/enhancements/tree/master/keps/sig-network/2593-multiple-cluster-cidrs) introduced as an alpha feature in Kubernetes 1.26.

## Prerequisites
* A cluster running Kubernetes 1.26 (this was tested on version 1.26.0-alpha.1)
* Use flannel version ???
* The MultiClusterCIDR APi can be used with vxlan, wireguard and host-gw backend of flannel

*Note*: once a PodCIDR is allocated to a node, it cannot be modified or removed. So you need to configure the MultiClusterCIDR before you add the new nodes to your cluster.

## How to use the MultiClusterCIDR API
### Enable the new API in the control plane
* Edit `/etc/kubernetes/manifests/kube-controller-manager.yaml` and add the following lines in the `spec.containers.command` section:
```
- --cidr-allocator-type=MultiCIDRRangeAllocator
- --feature-gates=MultiCIDRRangeAllocator=true
```

* Edit `/etc/kubernetes/manifests/kube-apiserver.yaml` and add the following line in the `spec.containers.command` section:
```
- --runtime-config=networking.k8s.io/v1alpha1
```

Both components should restart automatically and a default ClusterCIDR resource will be created based on the usual `pod-network-cidr` parameter.

For example:
```bash
$ kubectl get clustercidr
NAME PERNODEHOSTBITS IPV4 IPV6 AGE
default-cluster-cidr 8 10.244.0.0/16 2001:cafe:42::/112 24h

$ kubectl describe clustercidr default-cluster-cidr
Name: default-cluster-cidr
Labels: <none>
Annotations: <none>
NodeSelector:
PerNodeHostBits: 8
IPv4: 10.244.0.0/16
IPv6: 2001:cafe:42::/112
Events: <none>
```

### Enable the new feature in flannel
This feature is disabled by default. To enable it, add the following flag to the `kube-flannel` container:
```
- --use-multi-cluster-cidr
```

Since you will specify the subnets to use for pods IP addresses through the new API, you do not need a the `Network` and `IPv6Network` section in the flannel configuration. Thus your flannel configuration could look like this:
```json
{
"EnableIPv6": true,
"Backend": {
"Type": "host-gw"
}
}
```


If you let them in, they will simply be ignored by flannel.


### Configure the required `clustercidr` resources
Before adding nodes to the cluster, you need to add new `clustercidr` resources.

For example:
```yaml
apiVersion: networking.k8s.io/v1alpha1
kind: ClusterCIDR
metadata:
name: my-cidr-1
spec:
nodeSelector:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- "worker1"
perNodeHostBits: 8
ipv4: 10.248.0.0/16
ipv6: 2001:cafe:43::/112
---
apiVersion: networking.k8s.io/v1alpha1
kind: ClusterCIDR
metadata:
name: my-cidr-2
spec:
nodeSelector:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- "worker2"
perNodeHostBits: 8
ipv4: 10.247.0.0/16
ipv6: ""
```
For more details on the `spec` section, see the [feature specification page](/~https://github.com/kubernetes/enhancements/tree/master/keps/sig-network/2593-multiple-cluster-cidrs#expected-behavior).

*WARNING*: all the fields in the `spec` section are immutable.

For more information on Node Selectors, see [the Kubernetes documentation](https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/).

### Add nodes to the cluster
The new nodes will be allocated a `PodCIDR` based on the configured `clustercidr`.
flannel will ensure connectivity between all the pods regardless of the subnet in which the pod's IP address has been allocated.


## Notes on using IPv6 with the MultiClusterCIDR API
The feature is fully compatible with IPv6 and dual-stack networking.
Each `clustercidr` resource can include an IPv4 and/or an IPv6 subnet.
If both are provided, the PodCIDR allocated based on this `clustercidr` will be dual-stack.
The controller allows you to use IPv4, IPv6 and dual-stack `clustercidr` resources all at the same time to facilitate cluster migrations.
As a result, it is up to you to ensure the coherence of your IP allocation.

It seems that there are inconsistencies in the IPv6 subnet allocation between the standard and the new API.
If you want to use dual-stack networking with the new API, we recommend that you do not specify the `--pod-network-cidr` flag to `kubeadm` when installing the cluster so that you can manually configure the controller later.
In that case, when you edit `/etc/kubernetes/manifests/kube-controller-manager.yaml`, add:
```
- --cidr-allocator-type=MultiCIDRRangeAllocator
- --feature-gates=MultiCIDRRangeAllocator=true
- --cluster-cidr=10.244.0.0/16,2001:cafe:42::/112 #replace with your own default clusterCIDR
- --node-cidr-mask-size-ipv6=120
- --allocate-node-cidrs
```
4 changes: 2 additions & 2 deletions Documentation/kube-flannel-psp.yml
Original file line number Diff line number Diff line change
Expand Up @@ -166,8 +166,8 @@ spec:
serviceAccountName: flannel
initContainers:
- name: install-cni-plugin
#image: flannelcni/flannel-cni-plugin:v1.1.0 for ppc64le and mips64le (dockerhub limitations may apply)
image: docker.io/rancher/mirrored-flannelcni-flannel-cni-plugin:v1.1.0
#image: flannelcni/flannel-cni-plugin:v1.2.0 for ppc64le and mips64le (dockerhub limitations may apply)
image: docker.io/rancher/mirrored-flannelcni-flannel-cni-plugin:v1.2.0
command:
- cp
args:
Expand Down
15 changes: 11 additions & 4 deletions Documentation/kube-flannel.yml
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,13 @@ rules:
- nodes/status
verbs:
- patch
- apiGroups:
- "networking.k8s.io"
resources:
- clustercidrs
verbs:
- list
- watch
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
Expand Down Expand Up @@ -122,8 +129,8 @@ spec:
serviceAccountName: flannel
initContainers:
- name: install-cni-plugin
#image: flannelcni/flannel-cni-plugin:v1.1.0 for ppc64le and mips64le (dockerhub limitations may apply)
image: docker.io/rancher/mirrored-flannelcni-flannel-cni-plugin:v1.1.0
#image: flannelcni/flannel-cni-plugin:v1.2.0 #for ppc64le and mips64le (dockerhub limitations may apply)
image: docker.io/rancher/mirrored-flannelcni-flannel-cni-plugin:v1.2.0
command:
- cp
args:
Expand All @@ -134,7 +141,7 @@ spec:
- name: cni-plugin
mountPath: /opt/cni/bin
- name: install-cni
#image: flannelcni/flannel:v0.20.1 for ppc64le and mips64le (dockerhub limitations may apply)
#image: flannelcni/flannel:v0.20.1 #for ppc64le and mips64le (dockerhub limitations may apply)
image: docker.io/rancher/mirrored-flannelcni-flannel:v0.20.1
command:
- cp
Expand All @@ -149,7 +156,7 @@ spec:
mountPath: /etc/kube-flannel/
containers:
- name: kube-flannel
#image: flannelcni/flannel:v0.20.1 for ppc64le and mips64le (dockerhub limitations may apply)
#image: flannelcni/flannel:v0.20.1 #for ppc64le and mips64le (dockerhub limitations may apply)
image: docker.io/rancher/mirrored-flannelcni-flannel:v0.20.1
command:
- /opt/bin/flanneld
Expand Down
7 changes: 6 additions & 1 deletion backend/ipip/ipip.go
Original file line number Diff line number Diff line change
Expand Up @@ -89,7 +89,12 @@ func (be *IPIPBackend) RegisterNetwork(ctx context.Context, wg *sync.WaitGroup,
return nil, fmt.Errorf("failed to acquire lease: %v", err)
}

link, err := be.configureIPIPDevice(n.SubnetLease, subnet.GetFlannelNetwork(config))
net, err := config.GetFlannelNetwork(&n.SubnetLease.Subnet)
if err != nil {
return nil, err
}

link, err := be.configureIPIPDevice(n.SubnetLease, net)

if err != nil {
return nil, err
Expand Down
6 changes: 5 additions & 1 deletion backend/udp/udp_amd64.go
Original file line number Diff line number Diff line change
Expand Up @@ -78,11 +78,15 @@ func (be *UdpBackend) RegisterNetwork(ctx context.Context, wg *sync.WaitGroup, c
return nil, fmt.Errorf("failed to acquire lease: %v", err)
}

net, err := config.GetFlannelNetwork(&l.Subnet)
if err != nil {
return nil, err
}
// Tunnel's subnet is that of the whole overlay network (e.g. /16)
// and not that of the individual host (e.g. /24)
tunNet := ip.IP4Net{
IP: l.Subnet.IP,
PrefixLen: subnet.GetFlannelNetwork(config).PrefixLen,
PrefixLen: net.PrefixLen,
}

return newNetwork(be.sm, be.extIface, cfg.Port, tunNet, l)
Expand Down
12 changes: 10 additions & 2 deletions backend/vxlan/vxlan.go
Original file line number Diff line number Diff line change
Expand Up @@ -191,12 +191,20 @@ func (be *VXLANBackend) RegisterNetwork(ctx context.Context, wg *sync.WaitGroup,
// This IP is just used as a source address for host to workload traffic (so
// the return path for the traffic has an address on the flannel network to use as the destination)
if config.EnableIPv4 {
if err := dev.Configure(ip.IP4Net{IP: lease.Subnet.IP, PrefixLen: 32}, subnet.GetFlannelNetwork(config)); err != nil {
net, err := config.GetFlannelNetwork(&lease.Subnet)
if err != nil {
return nil, err
}
if err := dev.Configure(ip.IP4Net{IP: lease.Subnet.IP, PrefixLen: 32}, net); err != nil {
return nil, fmt.Errorf("failed to configure interface %s: %w", dev.link.Attrs().Name, err)
}
}
if config.EnableIPv6 {
if err := v6Dev.ConfigureIPv6(ip.IP6Net{IP: lease.IPv6Subnet.IP, PrefixLen: 128}, subnet.GetFlannelIPv6Network(config)); err != nil {
net, err := config.GetFlannelIPv6Network(&lease.IPv6Subnet)
if err != nil {
return nil, err
}
if err := v6Dev.ConfigureIPv6(ip.IP6Net{IP: lease.IPv6Subnet.IP, PrefixLen: 128}, net); err != nil {
return nil, fmt.Errorf("failed to configure interface %s: %w", v6Dev.link.Attrs().Name, err)
}
}
Expand Down
18 changes: 14 additions & 4 deletions backend/wireguard/device.go
Original file line number Diff line number Diff line change
Expand Up @@ -219,20 +219,29 @@ func (dev *wgDevice) upAndAddRoute(dst *net.IPNet) error {
return fmt.Errorf("failed to set interface %s to UP state: %w", dev.attrs.name, err)
}

err = dev.addRoute(dst)
if err != nil {
return fmt.Errorf("failed to add route to destination (%s) to interface (%s): %w", dst, dev.attrs.name, err)
}
return nil
}

func (dev *wgDevice) addRoute(dst *net.IPNet) error {
route := netlink.Route{
LinkIndex: dev.link.Attrs().Index,
Scope: netlink.SCOPE_LINK,
Dst: dst,
}
err = netlink.RouteAdd(&route)
log.Infof("add route: %s", route)
err := netlink.RouteAdd(&route)
if err != nil {
return fmt.Errorf("failed to add route %s: %w", dev.attrs.name, err)
}

return nil
}

func (dev *wgDevice) Configure(devIP ip.IP4, flannelnet ip.IP4Net) error {
func (dev *wgDevice) Configure(devIP ip.IP4, flannelnet ip.IP4Net, flannelnets []ip.IP4Net) error {

net := ip.IP4Net{IP: devIP, PrefixLen: 32}
err := ip.EnsureV4AddressOnLink(net, flannelnet, dev.link)
if err != nil {
Expand All @@ -246,7 +255,7 @@ func (dev *wgDevice) Configure(devIP ip.IP4, flannelnet ip.IP4Net) error {
return nil
}

func (dev *wgDevice) ConfigureV6(devIP *ip.IP6, flannelnet ip.IP6Net) error {
func (dev *wgDevice) ConfigureV6(devIP *ip.IP6, flannelnet ip.IP6Net, flannelnets []ip.IP6Net) error {
net := ip.IP6Net{IP: devIP, PrefixLen: 128}
err := ip.EnsureV6AddressOnLink(net, flannelnet, dev.link)
if err != nil {
Expand All @@ -261,6 +270,7 @@ func (dev *wgDevice) ConfigureV6(devIP *ip.IP6, flannelnet ip.IP6Net) error {
}

func (dev *wgDevice) addPeer(publicEndpoint string, peerPublicKeyRaw string, peerSubnets []net.IPNet) error {
log.Infof("adding peers %s to endpoint %s", peerSubnets, publicEndpoint)
udpEndpoint, err := net.ResolveUDPAddr("udp", publicEndpoint)
if err != nil {
return fmt.Errorf("failed to resolve UDP address: %w", err)
Expand Down
14 changes: 11 additions & 3 deletions backend/wireguard/wireguard.go
Original file line number Diff line number Diff line change
Expand Up @@ -168,17 +168,25 @@ func (be *WireguardBackend) RegisterNetwork(ctx context.Context, wg *sync.WaitGr
}

if config.EnableIPv4 {
err = dev.Configure(lease.Subnet.IP, subnet.GetFlannelNetwork(config))
net, err := config.GetFlannelNetwork(&lease.Subnet)
if err != nil {
return nil, err
}
err = dev.Configure(lease.Subnet.IP, net, config.Networks.ToSlice())
if err != nil {
return nil, err
}
}

if config.EnableIPv6 {
ipv6net, err := config.GetFlannelIPv6Network(&lease.IPv6Subnet)
if err != nil {
return nil, err
}
if cfg.Mode == Separate {
err = v6Dev.ConfigureV6(lease.IPv6Subnet.IP, subnet.GetFlannelIPv6Network(config))
err = v6Dev.ConfigureV6(lease.IPv6Subnet.IP, ipv6net, config.IPv6Networks.ToSlice())
} else {
err = dev.ConfigureV6(lease.IPv6Subnet.IP, subnet.GetFlannelIPv6Network(config))
err = dev.ConfigureV6(lease.IPv6Subnet.IP, ipv6net, config.IPv6Networks.ToSlice())
}
if err != nil {
return nil, err
Expand Down
Loading

0 comments on commit e425a3d

Please sign in to comment.