This project implements a proof-of-concept (POC) for proactive autoscaling in Kubernetes using KEDA (Kubernetes Event-Driven Autoscaling) and gRPC. The goal is to simulate an autoscaler that scales a Kubernetes deployment based on predicted resource demand, rather than relying on historical metrics, to minimize downtime and optimize resource utilization.
- Proactive Scaling: 🚀 Scale your applications ahead of time with predicted metrics, reducing downtime during traffic spikes.
- KEDA External Scaler: ⚙️ A gRPC-based external scaler that integrates with KEDA to scale Kubernetes workloads based on future predictions.
- Traffic Simulation: 🚦 Using K6 to simulate traffic, enabling testing of the proactive scaling behavior.
- Simple Simulation: 🧪 In this POC, we use predictable traffic patterns to simulate the behavior of an ML model without actual predictions.
- Check out the demo video to see the proactive autoscaling system in action!
- Find more details in this article.
- Kubernetes in Docker (KinD): 🐳 Used to set up a local Kubernetes cluster.
- KEDA: 📈 Autoscaling tool installed via Helm in the Kubernetes cluster.
- gRPC External Scaler: 🔗 Scaler implemented with gRPC, deployed as a service in the cluster.
- Prometheus: 📊 Scrapes and exposes custom metrics from the application.
- K6: 🧪 Traffic generation tool used to simulate a predictable load for testing.
The solution involves three primary steps:
- Traffic Generation: 🌐 K6 generates periodic traffic data, which is scraped by Prometheus.
- Prediction Simulation: 🔮 The ML model is simulated by reusing past data to predict future load.
- External Scaler: 🔗 The gRPC-based external scaler interacts with KEDA and Prometheus to scale workloads proactively.
- IsActive: 🟢 Checks if the scaler is ready for scaling.
- GetMetricSpec: 📊 Returns the threshold of each metric, often defined in the ScaledObject.
- GetMetrics: 🔮 Returns the predicted value of the metric for making scaling decisions.
- 🐳 Docker
- ☸️ Kubernetes (KinD)
- 🛠️ Helm
- 📊 Prometheus
- 📈 KEDA
- 🐍 Python (for gRPC)
- 🧪 K6 (for traffic generation)
-
Set up a local Kubernetes cluster using KinD:
kind create cluster
-
Install KEDA using Helm:
helm repo add kedacore https://kedacore.github.io/charts helm repo update helm install keda kedacore/keda
-
Deploy Prometheus in the cluster:
kubectl apply -f prometheus.yaml
-
Build and deploy the gRPC-based external scaler:
docker build -t external-scaler . kubectl apply -f scaler-deployment.yaml
-
Simulate traffic using K6:
k6 run script.js
-
Apply the ScaledObject:
kubectl apply -f scaledobject.yaml
- The testing involves generating HTTP requests using K6 and monitoring the scaling behavior using the
kubectl get pods --watch
command. - The scaler reacts 30 seconds in advance to the predicted load, allowing for proactive scaling decisions.
- No actual ML model: The current implementation only simulates the behavior of an ML model. Implementing a real ML model requires deploying the model with GPU resources, either in the same Kubernetes cluster or externally via a REST API.
- Simulated Traffic: The traffic generated by K6 is perfectly periodic, simplifying the scaling decisions. Real-world traffic may require more complex predictions.
- Implementing a real ML model for prediction.
- Integrating GPUs for model training and retraining in the Kubernetes cluster.
- Periodic retraining of the ML model based on data drift and accuracy monitoring.