Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Classify some gRPC status codes as non-errors #395

Merged
merged 1 commit into from
Dec 18, 2019

Conversation

tegioz
Copy link
Contributor

@tegioz tegioz commented Nov 18, 2019

Linkerd classifies all gRPC status codes except OK as errors. This can negatively affect a gRPC server's success rate, even when it is only returning things like NOT_FOUND or INVALID_ARGUMENT.

This change narrows down the list of gRPC status codes that are considered an error to:

  • UNKNOWN (2)
  • DEADLINE_EXCEEDED (4)
  • INTERNAL (13)
  • UNAVAILABLE (14)
  • DATA_LOSS (15)

Please see this linkerd2 issue for more details.

Signed-off-by: Sergio Castaño Arteaga tegioz@icloud.com

Copy link
Member

@adleong adleong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks 0 to me. I looked for problems in this code but they were 5 and anyone saying otherwise has an 3.

@olix0r
Copy link
Member

olix0r commented Dec 2, 2019

It seems to me that other grpc status codes should not be considered errors as well. In fact, it seems like only a few should be considered errors: /~https://github.com/grpc/grpc/blob/master/doc/statuscodes.md

Specifically, UNAVAILABLE and INTERNAL seem to be clear errors. I'm not sure that any of the other error codes actually correspond to 5XX server failures.

Signed-off-by: Sergio Castaño Arteaga <tegioz@icloud.com>
@tegioz tegioz force-pushed the tegioz/grpc-success-status-codes branch from e2e66f1 to 53ab87d Compare December 9, 2019 06:49
Copy link
Member

@olix0r olix0r left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you!

@olix0r olix0r merged commit de09017 into linkerd:master Dec 18, 2019
olix0r added a commit to linkerd/linkerd2 that referenced this pull request Dec 18, 2019
This release adds a defense mechanism to ensure that resolutions are
released when the associated balancer becomes idle and should have
been dropped from the proxy.

Furthermore, the proxy is now more selective as to which gRPC status
codes are considered "failures" in metrics.

---

* Classify some gRPC status codes as non-errors (linkerd/linkerd2-proxy#395)
* discover: Timeout stalled resolutions (linkerd/linkerd2-proxy#401)
olix0r added a commit to linkerd/linkerd2 that referenced this pull request Dec 19, 2019
This release adds a defense mechanism to ensure that resolutions are
released when the associated balancer becomes idle and should have
been dropped from the proxy.

Furthermore, the proxy is now more selective as to which gRPC status
codes are considered "failures" in metrics.

---

* Classify some gRPC status codes as non-errors (linkerd/linkerd2-proxy#395)
* discover: Timeout stalled resolutions (linkerd/linkerd2-proxy#401)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants