Allow connection pool to gradually drain down #55

mikkokar · 2018-01-08T08:56:10Z

The problem

When an origin is removed from configuration, the associated connection pool is closed. The closure is rather abrupt as it closes each borrowed connection straight away:

    @Override
    public void close() {
        removeEachAndProcess(waitingSubscribers, subscriber ->
                subscriber.onError(new RuntimeException("Connection pool closed")));
        removeEachAndProcess(availableConnections, Connection::close);
        borrowedConnections.forEach(Connection::close);
    }

Desired behaviour

Ideally, the connection pool should be drained down without affecting the ongoing connections.

When closed, the connection pool should go into a draining state where it stops accepting new connections, and waits until all borrowed connections are returned back to the pool. After which it then finally considered fully closed.

That is, the signature of the close method should probably change:

    public Future<Void> close() { ... }

Detailed description

See above.

Acceptance criteria

Send N requests to an origin. Remove origin from connection. The N requests should successfully complete.
The origin is removed from the load balancer as soon as the connection pool closure is triggered. That is, it should not get any traffic while it is in the draining state.

The text was updated successfully, but these errors were encountered:

dvlato · 2018-05-21T11:55:42Z

This should also be considered at the Styx server level: the server should quite gracefully without losing existing requests.

Many server applications and distributed environments (Kubernetes, OpenShift...) use the following approach, so we might want to support it natively:

The application receives a SIGTERM signal. This signal indicates that the server should stop receiving new requests but complete the existing ones successfully.

With non-persistent connections this would mean not accepting new connections, but persistent(keep-alive) connections need to be handled differently, probably returning a Connection: close response.

In the case of distributed environments, the SIGTERM signal is accompanied by taking the server out of rotation so that specific instance will stop receiving requests (those requests will go to other available servers so none will be rejected) . However, these systems cannot usually guarantee strict timing, so after receiving a SIGTERM the application could still receive new requests. Thus, if zero downtime is the goal, the SIGTERM is usually re-interpreted as 'start rejecting connections after a while', see this Kubernetes issue for details: kubernetes/ingress-nginx#322

We might decide to implement a 'distributed mode shutdown' that takes this use case into consideration, or just implement a normal shutdown process and expect the users to find environment-specific solutions (such as the preStop handler in Kubernetes) to avoid rejected connections.

If the server hasn't terminated successfully, a SIGKILL could be sent to terminate it abruptly. A specific handler for SIGKILL that doesn't block termination should be considered.

mikkokar · 2019-09-05T10:22:06Z

Also consider draining down all connections when styx as a whole shuts? Is there another ticket for this?

kvosper · 2024-01-11T11:45:25Z

Closing all issues over 3 years old. New issues can be created if problems are still occurring.

kvosper added bug and removed bug labels Jan 8, 2018

mikkokar added the enhancement label Apr 24, 2018

mikkokar added the P2 Plan label Sep 5, 2019

kvosper closed this as completed Jan 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow connection pool to gradually drain down #55

Allow connection pool to gradually drain down #55

mikkokar commented Jan 8, 2018

dvlato commented May 21, 2018

mikkokar commented Sep 5, 2019

kvosper commented Jan 11, 2024

Allow connection pool to gradually drain down #55

Allow connection pool to gradually drain down #55

Comments

mikkokar commented Jan 8, 2018

The problem

Desired behaviour

Detailed description

Acceptance criteria

dvlato commented May 21, 2018

mikkokar commented Sep 5, 2019

kvosper commented Jan 11, 2024