Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add request accepting grace period delaying graceful shutdown. #1971

Merged

Conversation

timoreimann
Copy link
Contributor

The new grace period defines how long Traefik should continue accepting requests prior to shutting down all connection listeners. It preceeds the existing (HTTP shutdown) grace period and allows for a full-circle graceful termination by expecting downstream load-balancers to take Traefik out of rotation in time, without the need for rather complex and external request draining procedures.

Immediate beneficiaries are providers such as Kubernetes and Marathon which expect in-cluster applications to exhibit the described termination behavior.

This PR addresses one of the two necessary TODOs for #1942. However, it also stands useful on its own.

@timoreimann timoreimann force-pushed the add-request-accepting-grace-period branch from 8904a1b to 882c473 Compare August 18, 2017 22:24
@timoreimann
Copy link
Contributor Author

@emilevauge I remember you weren't completely convinced of this feature, so I went ahead and implemented it to help giving the full picture.

See the linked issue for some extra rationale.

@m3co-code m3co-code requested review from ldez and removed request for ldez August 22, 2017 10:31
@timoreimann timoreimann force-pushed the add-request-accepting-grace-period branch from 882c473 to 40f6b39 Compare August 24, 2017 08:26
Copy link
Member

@emilevauge emilevauge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Design review LGTM :)
Like what we did with RespondingTimeouts& ForwardingTimeouts, it may be interesting to group both reqAcceptGraceTimeOut & GraceTimeOut into a struct (something like LifeCyleTimeouts).

@timoreimann
Copy link
Contributor Author

@emilevauge

Like what we did with RespondingTimeouts& ForwardingTimeouts, it may be interesting to group both reqAcceptGraceTimeOut & GraceTimeOut into a struct (something like LifeCyleTimeouts).

Sounds very reasonable to me. 👍 I'll work on making the necessary adjustments.

@ldez ldez added this to the 1.5 milestone Sep 12, 2017
@timoreimann timoreimann force-pushed the add-request-accepting-grace-period branch from 4980c33 to 8b5030f Compare September 12, 2017 14:50
@timoreimann
Copy link
Contributor Author

Still WIP, things aren't done yet.

@timoreimann timoreimann force-pushed the add-request-accepting-grace-period branch 3 times, most recently from 7e27abf to d51fe84 Compare September 15, 2017 15:46
@timoreimann
Copy link
Contributor Author

Holding back the bot as we only want to merge this after #2114 landed in master, after we which we need to update the integration test this PR adds with the new generic show-logs-on-integration-test-failure approach.

@timoreimann timoreimann force-pushed the add-request-accepting-grace-period branch from d51fe84 to edffad1 Compare September 15, 2017 22:13
)

// GlobalConfiguration holds global configuration (with providers, etc.).
// It's populated from the traefik configuration file passed as an argument to the binary.
type GlobalConfiguration struct {
GraceTimeOut flaeg.Duration `short:"g" description:"Duration to give active requests a chance to finish before Traefik stops"`
LifeCycle *LifeCycle `description:"Timeouts influencing the server life cycle"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you expect other settings than timeouts inside of the LifeCycle struct? If not do we want to rename it to LifeCycleTimeouts for consistency reasons?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might, I don't know.

The likely more important point for me is that the other approach stutters: LifeCycleTimeouts.GraceTimeOut has one timeout too many, so it should either be removed from the struct name or the parameters IMHO. I decided for the former as that allows us to extend the struct in the future without the need for another name update.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your point. SGTM.

Copy link
Contributor

@m3co-code m3co-code left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!


# Duration to keep accepting requests prior to initiating the graceful
# termination period (as defined by the `graceTimeOut` option). This
# option is meant to give downstream load-balancer sufficient time to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be either: "a downstream load-balancer" or "downstream load-balancers".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pluralized for the better.

@timoreimann
Copy link
Contributor Author

timoreimann commented Sep 19, 2017

This one is ready for review again (unless someone wants me to rebase first).

Copy link
Member

@juliens juliens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👏

Server string
}{whoami})
defer os.Remove(file)
cmd, output := s.cmdTraefik(withConfigFile(file))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use the new version of traefikCmd with defer display (#2114)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

// LifeCycle contains configurations relevant to the lifecycle (such as the
// shutdown phase) of Traefik.
type LifeCycle struct {
ReqAcceptGraceTimeOut flaeg.Duration `description:"Duration to keep accepting requests before Traefik initiates the graceful shutdown procedure"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not RequestAcceptGraceTimeout ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, why not -- done.

select {
case err := <-waitErr:
c.Assert(err, checker.IsNil)
case <-time.After(10 * time.Second):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

5s (sleep) + 10s (after) = 15s
reqAcceptGraceTimeOut (10s) + defaultGraceTimeout(10s) = "20s"

Copy link
Contributor Author

@timoreimann timoreimann Sep 25, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The grace period shouldn't be used: We send the last request to the server at t =~ 5 when we are halfway into the request accept grace timeout. It should finish at t = 5 + delta, so at t = 10 no request should be left that needs to get served.

I left an extra comment to explain the chosen timeout.

The new grace period defines how long Traefik should continue accepting
requests prior to shutting down all connection listeners. It preceeds
the existing (HTTP shutdown) grace period and allows for a full-circle
graceful termination by expecting downstream load-balancers to take
Traefik out of rotation in time, without the need for rather complex and
external request draining procedures.

Immediate beneficiaries are providers such as Kubernetes and Marathon
which expect in-cluster applications to exhibit the described
termination behavior.
Former name stems from a previous (never published) naming schema.
Doesn't really make sense since we shut down the server for SIGINT
gracefully as well.
@timoreimann
Copy link
Contributor Author

@juliens et. al: This is ready for review again. I have also adjusted for #2114.

@m3co-code
Copy link
Contributor

My LGTM stays 👍

Copy link
Contributor

@nmengin nmengin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🕙 👏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/enhancement a new or improved feature. size/M
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants