-
Notifications
You must be signed in to change notification settings - Fork 2.7k
Commit
The overload detector uses a TCP Vegas based algorithm, as implemented by Netflix Concurrency Limiters. Priority load shedding uses 5 priority levels and 128 cohorts. A simple cubic function is used to determine the threshold that current CPU load has to reach to reject the current request.
- Loading branch information
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,150 @@ | ||
//// | ||
This guide is maintained in the main Quarkus repository | ||
and pull requests should be submitted there: | ||
/~https://github.com/quarkusio/quarkus/tree/main/docs/src/main/asciidoc | ||
//// | ||
= Load Shedding reference guide | ||
Check warning on line 6 in docs/src/main/asciidoc/load-shedding-reference.adoc GitHub Actions / Linting with Vale
|
||
include::_attributes.adoc[] | ||
:numbered: | ||
:sectnums: | ||
:categories: web | ||
:topics: web,load-shedding | ||
:extensions: io.quarkus:quarkus-load-shedding | ||
:extension-status: experimental | ||
|
||
include::{includes}/extension-status.adoc[] | ||
|
||
Load shedding is the practice of detecting service overload and rejecting requests. | ||
|
||
In Quarkus, the `quarkus-load-shedding` extension provides a load shedding mechanism. | ||
|
||
== Use the Load Shedding extension | ||
|
||
To use the load shedding extension, you need to add the `io.quarkus:quarkus-load-shedding` extension to your project: | ||
Check warning on line 23 in docs/src/main/asciidoc/load-shedding-reference.adoc GitHub Actions / Linting with Vale
|
||
|
||
[source,xml,role="primary asciidoc-tabs-target-sync-cli asciidoc-tabs-target-sync-maven"] | ||
.pom.xml | ||
---- | ||
<dependency> | ||
<groupId>io.quarkus</groupId> | ||
<artifactId>quarkus-load-shedding</artifactId> | ||
</dependency> | ||
---- | ||
|
||
[source,gradle,role="secondary asciidoc-tabs-target-sync-gradle"] | ||
.build.gradle | ||
Check warning on line 35 in docs/src/main/asciidoc/load-shedding-reference.adoc GitHub Actions / Linting with Vale
|
||
---- | ||
implementation("io.quarkus:quarkus-load-shedding") | ||
---- | ||
|
||
No configuration is required, though the possible configuration options are described below. | ||
|
||
== The load shedding algorithm | ||
|
||
The load shedding algorithm has 2 parts: | ||
|
||
* overload detection | ||
* priority load shedding (optional) | ||
|
||
=== Overload detection | ||
|
||
To detect whether the current service is overloaded, an adaptation of TCP Vegas is used. | ||
|
||
The algorithm starts with 100 allowed concurrent requests. | ||
For each request, it compares the number of current requests with the allowed limit and if the limit is exceeded, an overload situation is signalled. | ||
|
||
If the limit is not exceeded, or if priority load shedding determines that the request should not be rejected (see below), the request is allowed. | ||
When it finishes, its duration is compared with the lowest duration seen so far to estimate a queue size. | ||
If the queue size is lower than _alpha_, the current limit is increased, but only up to a given maximum, by default 1000. | ||
If the queue size is greater than _beta_, the current limit is decreased. | ||
Otherwise, the current limit is kept intact. | ||
|
||
Alpha and beta are computed by multiplying the configurable constants with a base 10 logarithm of the current limit. | ||
|
||
After some number of requests, which can be modified by configuring the _probe_ factor, the lowest duration seen is reset to the last seen duration of a request. | ||
|
||
=== Priority load shedding | ||
|
||
If an overload situation is signalled, priority load shedding is invoked. | ||
|
||
By default, priority load shedding is enabled, which means a request is only rejected if the current CPU load is high enough. | ||
To determine whether a request should be rejected, 2 attributes are considered: | ||
|
||
* request priority | ||
* request cohort | ||
|
||
There are 5 statically defined priorities and 128 cohorts, which amounts to 640 request groups in total. | ||
|
||
After both priority and cohort are assigned to a request, a request group number is computed: `group = priority * num_cohorts + cohort`. | ||
Then, the group number is compared to a simple cubic function of current CPU load, where `load` is a number between 0 and 1: `num_groups * (1 - load^3)`. | ||
If the group number is higher, the request is rejected, otherwise it is allowed even in an overload situation. | ||
|
||
If priority load shedding is disabled, all requests are rejected in an overload situation. | ||
|
||
==== Customizing request priority | ||
|
||
Priority is assigned by a `io.quarkus.load.shedding.RequestPrioritizer`. | ||
There is 5 statically defined priorities in the `io.quarkus.load.shedding.RequestPriority` enum: `CRITICAL`, `IMPORTANT`, `NORMAL`, `BACKGROUND` and `DEGRADED`. | ||
By default, if no request prioritizer applies, the priority is assumed to be `NORMAL`. | ||
Check warning on line 88 in docs/src/main/asciidoc/load-shedding-reference.adoc GitHub Actions / Linting with Vale
|
||
|
||
There is one default prioritizer which assigns the priority of `CRITICAL` to requests to the non-application endpoints. | ||
Check warning on line 90 in docs/src/main/asciidoc/load-shedding-reference.adoc GitHub Actions / Linting with Vale
Check warning on line 90 in docs/src/main/asciidoc/load-shedding-reference.adoc GitHub Actions / Linting with Vale
|
||
It declares no `@Priority`. | ||
|
||
It is possible to define custom implementations of the `RequestPrioritizer` interface. | ||
The implementations must be CDI beans, otherwise they are ignored. | ||
The CDI rules of typesafe resolution must be followed. | ||
Check warning on line 95 in docs/src/main/asciidoc/load-shedding-reference.adoc GitHub Actions / Linting with Vale
|
||
That is, if multiple implementations exist with a different `@Priority` value and some of them are ``@Alternative``s, only the alternatives with the highest priority value are retained. | ||
If no implementation is an alternative, all implementations are retained and are sorted in descending `@Priority` order (highest priority value comes first). | ||
|
||
==== Customizing request cohort | ||
|
||
Cohort is assigned by a `io.quarkus.load.shedding.RequestClassifier`. | ||
There is 128 statically defined cohorts, with the lowest number being 1 and highest number being 128. | ||
The classifier should return a number in this interval; if it does not, the number is adjusted automatically. | ||
|
||
There is one default classifier which assigns a cohort based on a hash of the remote IP address and current time, such that an IP address changes its cohort roughly every hour. | ||
Check warning on line 105 in docs/src/main/asciidoc/load-shedding-reference.adoc GitHub Actions / Linting with Vale
|
||
It declares no `@Priority`. | ||
|
||
It is possible to define custom implementations of the `RequestClassifier` interface. | ||
The implementations must be CDI beans, otherwise they are ignored. | ||
The CDI rules of typesafe resolution must be followed. | ||
Check warning on line 110 in docs/src/main/asciidoc/load-shedding-reference.adoc GitHub Actions / Linting with Vale
|
||
That is, if multiple implementations exist with a different `@Priority` value and some of them are ``@Alternative``s, only the alternatives with the highest priority value are retained. | ||
If no implementation is an alternative, all implementations are retained and are sorted in descending `@Priority` order (highest priority value comes first). | ||
|
||
== Limitations | ||
|
||
The load shedding extension currently only applies to HTTP requests, and is heavily skewed towards request/response network interactions. | ||
This means that gRPC, WebSocket and other kinds of streaming over HTTP are not supported. | ||
Other "entrypoints" to Quarkus applications, such as messaging, are not supported either. | ||
|
||
Further, the load shedding implementation is currently rather basic and not heavily tested in production. | ||
Improvements may be necessary. | ||
Check warning on line 121 in docs/src/main/asciidoc/load-shedding-reference.adoc GitHub Actions / Linting with Vale
|
||
|
||
== Configuration reference | ||
|
||
include::{generated-dir}/config/quarkus-load-shedding.adoc[opts=optional, leveloffset=+1] | ||
|
||
== Further reading | ||
|
||
Netflix Technology Blog: | ||
|
||
* https://netflixtechblog.medium.com/performance-under-load-3e6fa9a60581[Performance Under Load] | ||
* https://netflixtechblog.com/keeping-netflix-reliable-using-prioritized-load-shedding-6cc827b02f94[Keeping Netflix Reliable Using Prioritized Load Shedding] | ||
|
||
Uber Engineering Blog: | ||
|
||
* https://www.uber.com/blog/cinnamon-using-century-old-tech-to-build-a-mean-load-shedder/[Cinnamon: Using Century Old Tech to Build a Mean Load Shedder] | ||
* https://www.uber.com/blog/pid-controller-for-cinnamon/[PID Controller for Cinnamon] | ||
* https://www.uber.com/blog/cinnamon-auto-tuner-adaptive-concurrency-in-the-wild/[Cinnamon Auto-Tuner: Adaptive Concurrency in the Wild] | ||
|
||
Amazon Builders' Library: | ||
|
||
* https://aws.amazon.com/builders-library/using-load-shedding-to-avoid-overload/[Using load shedding to avoid overload] | ||
|
||
Google Cloud Blog: | ||
|
||
* https://cloud.google.com/blog/products/gcp/using-load-shedding-to-survive-a-success-disaster-cre-life-lessons[Using load shedding to survive a success disaster] | ||
|
||
CodeReliant Blog: | ||
|
||
* https://www.codereliant.io/load-shedding/[Load Shedding for High Traffic Systems] |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,66 @@ | ||
<?xml version="1.0" encoding="UTF-8"?> | ||
<project xmlns="http://maven.apache.org/POM/4.0.0" | ||
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" | ||
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 https://maven.apache.org/xsd/maven-4.0.0.xsd"> | ||
<modelVersion>4.0.0</modelVersion> | ||
|
||
<parent> | ||
<groupId>io.quarkus</groupId> | ||
<artifactId>quarkus-load-shedding-parent</artifactId> | ||
<version>999-SNAPSHOT</version> | ||
</parent> | ||
|
||
<artifactId>quarkus-load-shedding-deployment</artifactId> | ||
|
||
<name>Quarkus - Load Shedding - Deployment</name> | ||
|
||
<dependencies> | ||
<dependency> | ||
<groupId>io.quarkus</groupId> | ||
<artifactId>quarkus-load-shedding</artifactId> | ||
</dependency> | ||
|
||
<dependency> | ||
<groupId>io.quarkus</groupId> | ||
<artifactId>quarkus-vertx-http-deployment</artifactId> | ||
</dependency> | ||
<dependency> | ||
<groupId>io.quarkus</groupId> | ||
<artifactId>quarkus-rest-deployment</artifactId> | ||
<scope>test</scope> | ||
</dependency> | ||
|
||
<dependency> | ||
<groupId>io.quarkus</groupId> | ||
<artifactId>quarkus-junit5-internal</artifactId> | ||
<scope>test</scope> | ||
</dependency> | ||
<dependency> | ||
<groupId>io.rest-assured</groupId> | ||
<artifactId>rest-assured</artifactId> | ||
<scope>test</scope> | ||
</dependency> | ||
<dependency> | ||
<groupId>org.assertj</groupId> | ||
<artifactId>assertj-core</artifactId> | ||
<scope>test</scope> | ||
</dependency> | ||
</dependencies> | ||
|
||
<build> | ||
<plugins> | ||
<plugin> | ||
<artifactId>maven-compiler-plugin</artifactId> | ||
<configuration> | ||
<annotationProcessorPaths> | ||
<path> | ||
<groupId>io.quarkus</groupId> | ||
<artifactId>quarkus-extension-processor</artifactId> | ||
<version>${project.version}</version> | ||
</path> | ||
</annotationProcessorPaths> | ||
</configuration> | ||
</plugin> | ||
</plugins> | ||
</build> | ||
</project> |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,34 @@ | ||
package io.quarkus.load.shedding.deployment; | ||
|
||
import java.util.ArrayList; | ||
import java.util.List; | ||
|
||
import io.quarkus.arc.deployment.AdditionalBeanBuildItem; | ||
import io.quarkus.deployment.annotations.BuildStep; | ||
import io.quarkus.deployment.builditem.FeatureBuildItem; | ||
import io.quarkus.load.shedding.runtime.HttpLoadShedding; | ||
import io.quarkus.load.shedding.runtime.HttpRequestClassifier; | ||
import io.quarkus.load.shedding.runtime.ManagementRequestPrioritizer; | ||
import io.quarkus.load.shedding.runtime.OverloadDetector; | ||
import io.quarkus.load.shedding.runtime.PriorityLoadShedding; | ||
|
||
public class LoadSheddingProcessor { | ||
private static final String FEATURE = "load-shedding"; | ||
|
||
@BuildStep | ||
FeatureBuildItem feature() { | ||
return new FeatureBuildItem(FEATURE); | ||
} | ||
|
||
@BuildStep | ||
AdditionalBeanBuildItem beans() { | ||
List<String> beans = new ArrayList<>(); | ||
beans.add(OverloadDetector.class.getName()); | ||
beans.add(HttpLoadShedding.class.getName()); | ||
beans.add(PriorityLoadShedding.class.getName()); | ||
beans.add(ManagementRequestPrioritizer.class.getName()); | ||
beans.add(HttpRequestClassifier.class.getName()); | ||
|
||
return AdditionalBeanBuildItem.builder().addBeanClasses(beans).build(); | ||
} | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,65 @@ | ||
package io.quarkus.load.shedding; | ||
|
||
import static io.restassured.RestAssured.when; | ||
import static org.assertj.core.api.Assertions.assertThat; | ||
|
||
import java.util.concurrent.CountDownLatch; | ||
import java.util.concurrent.atomic.AtomicInteger; | ||
|
||
import jakarta.ws.rs.GET; | ||
import jakarta.ws.rs.Path; | ||
|
||
import org.junit.jupiter.api.Test; | ||
import org.junit.jupiter.api.extension.RegisterExtension; | ||
|
||
import io.quarkus.test.QuarkusUnitTest; | ||
|
||
public class NaiveLoadSheddingTest { | ||
private static final int NUM_THREADS = 20; | ||
private static final int NUM_REQUESTS = 10; | ||
|
||
@RegisterExtension | ||
static final QuarkusUnitTest config = new QuarkusUnitTest() | ||
.withApplicationRoot(jar -> jar.addClasses(MyResource.class)) | ||
.overrideConfigKey("quarkus.load-shedding.initial-limit", "5") | ||
.overrideConfigKey("quarkus.load-shedding.max-limit", "10") | ||
.overrideConfigKey("quarkus.load-shedding.priority.enabled", "false"); | ||
|
||
@Test | ||
public void test() throws InterruptedException { | ||
AtomicInteger numErrors = new AtomicInteger(); | ||
CountDownLatch begin = new CountDownLatch(1); | ||
CountDownLatch end = new CountDownLatch(NUM_THREADS); | ||
for (int i = 0; i < NUM_THREADS; i++) { | ||
new Thread(() -> { | ||
try { | ||
begin.await(); | ||
for (int j = 0; j < NUM_REQUESTS; j++) { | ||
int statusCode = when().get("/").then().extract().statusCode(); | ||
if (statusCode == 503) { | ||
numErrors.incrementAndGet(); | ||
} | ||
} | ||
end.countDown(); | ||
} catch (InterruptedException e) { | ||
throw new RuntimeException(e); | ||
} | ||
}).start(); | ||
} | ||
|
||
begin.countDown(); | ||
end.await(); | ||
|
||
// at least 1/2 of all requests failed | ||
assertThat(numErrors).hasValueGreaterThanOrEqualTo(100); | ||
} | ||
|
||
@Path("/") | ||
public static class MyResource { | ||
@GET | ||
public String hello() throws InterruptedException { | ||
Thread.sleep(100); | ||
return "Hello, world!"; | ||
} | ||
} | ||
} |