Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add health check timeout parameter #3813

Merged
merged 1 commit into from
Sep 27, 2018

Conversation

jbiel
Copy link
Contributor

@jbiel jbiel commented Aug 22, 2018

What does this PR do?

A health check timeout parameter has been added. This allows health checks to set custom timeout periods for backends. This value is currently non-configurable and set to 5 seconds.

If the health timeout is configured to be greater than the interval, an error message is logged and the interval is set to the timeout + 1 second. I did this because I noticed prior art of using sane defaults in the case of invalid configuration (I like this; I appreciate that the daemon doesn't just die.) If there's a better way to handle this or if this isn't desirable I'm open to any necessary changes.

Motivation

An application that we inherited periodically takes > 5 seconds to respond to health check requests and we don't want to consider backends that take longer than 5 seconds to respond unhealthy.

Fixes #3784

More

  • Added/updated tests
  • Added/updated documentation

Additional Notes

  • make test-unit was successful
  • make docs-verify was successful and docs look good visually
  • I'm unsure of why autogen/gentemplates/gen.go changed so much. I ran go generate with go 1.10.3 and go-bindata 3.4.0.

@ldez
Copy link
Contributor

ldez commented Aug 22, 2018

@jbiel jbiel force-pushed the health-check-timeout branch from 0c7a49e to 795c05d Compare August 23, 2018 01:56
@jbiel
Copy link
Contributor Author

jbiel commented Aug 23, 2018

Thanks @ldez, I've updated the PR to use that version and it's much cleaner. The Arch package sources go-bindata from /~https://github.com/shuLhan/go-bindata.

@jbdoumenjou
Copy link
Member

jbdoumenjou commented Aug 30, 2018

Hi @jbiel, thank you for your work.
if I am not mistaken, the following tests are to be updated:

  • provider/docker/config_container_docker_test.go
  • provider/docker/config_container_swarm_test.go
  • provider/ecs/config_test.go
  • provider/marathon/config_test.go
  • provider/mesos/config_test.go
  • provider/rancher/config_test.go

Copy link
Member

@juliens juliens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this PR.

}

if timeout >= interval {
interval = timeout + time.Duration(1*time.Second)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WDYT about just having a warn log and not changing the interval value. I think the +1s is a little bit arbitrary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could do this but I'm unsure of the consequences of potentially having multiple health checks running at once. For example:

  • Interval is set to 5s
  • Timeout is set to 20s
  • Backend is taking 15s to respond

In the above case we'll have up to 3 health checks running against a backend at the same time. Is this acceptable?

Copy link
Member

@juliens juliens Sep 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact, we can't launch new healthcheck if we are still waiting for the previous healthcheck response because the healthcheck is in the same goroutine as the interval.
/~https://github.com/containous/traefik/blob/fdf14cd101821b5ab0d5fb62b5ffaa489ad5eaab/healthcheck/healthcheck.go#L116-L126

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK - I've removed the code that changes the interval and changed the log to warn level.

@jbiel jbiel force-pushed the health-check-timeout branch 2 times, most recently from 2e6722d to 82e71ab Compare September 13, 2018 16:56
@jbiel
Copy link
Contributor Author

jbiel commented Sep 13, 2018

@jbdoumenjou, the PR has been updated to include updates to those tests.

@dtomcej
Copy link
Contributor

dtomcej commented Sep 14, 2018

Could this also be added to the kubernetes provider as healthcheck-timeout?

@jbiel jbiel force-pushed the health-check-timeout branch from 82e71ab to 578c599 Compare September 25, 2018 20:56
server/server_loadbalancer.go Outdated Show resolved Hide resolved
@jbiel jbiel force-pushed the health-check-timeout branch from 9a41266 to ecdaa54 Compare September 26, 2018 22:00
Copy link
Member

@juliens juliens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@dtomcej dtomcej left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
:shipit:

@ldez ldez removed the request for review from a team September 27, 2018 15:26
Copy link
Contributor

@ldez ldez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants