-
-
Notifications
You must be signed in to change notification settings - Fork 5.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add request accepting grace period delaying graceful shutdown. #1971
Add request accepting grace period delaying graceful shutdown. #1971
Conversation
8904a1b
to
882c473
Compare
@emilevauge I remember you weren't completely convinced of this feature, so I went ahead and implemented it to help giving the full picture. See the linked issue for some extra rationale. |
882c473
to
40f6b39
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Design review LGTM :)
Like what we did with RespondingTimeouts
& ForwardingTimeouts
, it may be interesting to group both reqAcceptGraceTimeOut
& GraceTimeOut
into a struct (something like LifeCyleTimeouts
).
Sounds very reasonable to me. 👍 I'll work on making the necessary adjustments. |
4980c33
to
8b5030f
Compare
Still WIP, things aren't done yet. |
7e27abf
to
d51fe84
Compare
Holding back the bot as we only want to merge this after #2114 landed in master, after we which we need to update the integration test this PR adds with the new generic show-logs-on-integration-test-failure approach. |
d51fe84
to
edffad1
Compare
) | ||
|
||
// GlobalConfiguration holds global configuration (with providers, etc.). | ||
// It's populated from the traefik configuration file passed as an argument to the binary. | ||
type GlobalConfiguration struct { | ||
GraceTimeOut flaeg.Duration `short:"g" description:"Duration to give active requests a chance to finish before Traefik stops"` | ||
LifeCycle *LifeCycle `description:"Timeouts influencing the server life cycle"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you expect other settings than timeouts inside of the LifeCycle
struct? If not do we want to rename it to LifeCycleTimeouts
for consistency reasons?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might, I don't know.
The likely more important point for me is that the other approach stutters: LifeCycleTimeouts.GraceTimeOut
has one timeout too many, so it should either be removed from the struct name or the parameters IMHO. I decided for the former as that allows us to extend the struct in the future without the need for another name update.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see your point. SGTM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
docs/configuration/commons.md
Outdated
|
||
# Duration to keep accepting requests prior to initiating the graceful | ||
# termination period (as defined by the `graceTimeOut` option). This | ||
# option is meant to give downstream load-balancer sufficient time to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be either: "a downstream load-balancer" or "downstream load-balancers".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pluralized for the better.
This one is ready for review again (unless someone wants me to rebase first). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👏
integration/basic_test.go
Outdated
Server string | ||
}{whoami}) | ||
defer os.Remove(file) | ||
cmd, output := s.cmdTraefik(withConfigFile(file)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you use the new version of traefikCmd with defer display (#2114)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
configuration/configuration.go
Outdated
// LifeCycle contains configurations relevant to the lifecycle (such as the | ||
// shutdown phase) of Traefik. | ||
type LifeCycle struct { | ||
ReqAcceptGraceTimeOut flaeg.Duration `description:"Duration to keep accepting requests before Traefik initiates the graceful shutdown procedure"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not RequestAcceptGraceTimeout ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, why not -- done.
select { | ||
case err := <-waitErr: | ||
c.Assert(err, checker.IsNil) | ||
case <-time.After(10 * time.Second): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
5s (sleep) + 10s (after) = 15s
reqAcceptGraceTimeOut (10s) + defaultGraceTimeout(10s) = "20s"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The grace period shouldn't be used: We send the last request to the server at t =~ 5 when we are halfway into the request accept grace timeout. It should finish at t = 5 + delta, so at t = 10 no request should be left that needs to get served.
I left an extra comment to explain the chosen timeout.
The new grace period defines how long Traefik should continue accepting requests prior to shutting down all connection listeners. It preceeds the existing (HTTP shutdown) grace period and allows for a full-circle graceful termination by expecting downstream load-balancers to take Traefik out of rotation in time, without the need for rather complex and external request draining procedures. Immediate beneficiaries are providers such as Kubernetes and Marathon which expect in-cluster applications to exhibit the described termination behavior.
Former name stems from a previous (never published) naming schema.
Doesn't really make sense since we shut down the server for SIGINT gracefully as well.
5ccbf12
to
11a6f02
Compare
My LGTM stays 👍 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 🕙 👏
11a6f02
to
dd1a108
Compare
The new grace period defines how long Traefik should continue accepting requests prior to shutting down all connection listeners. It preceeds the existing (HTTP shutdown) grace period and allows for a full-circle graceful termination by expecting downstream load-balancers to take Traefik out of rotation in time, without the need for rather complex and external request draining procedures.
Immediate beneficiaries are providers such as Kubernetes and Marathon which expect in-cluster applications to exhibit the described termination behavior.
This PR addresses one of the two necessary TODOs for #1942. However, it also stands useful on its own.