-
Notifications
You must be signed in to change notification settings - Fork 40.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix double counting issue for request metrics on timeout. #83427
Conversation
Currently we record request metrics during the normal request flow and we also manually invoke `Record` in the timeout handler to record timeouts. This means that we effectively double count whenever we timeout. This PR renames the `Record` function to `RecordRequestError` to more accurately reflect the intended side-effect of the function call. Change-Id: Ie37fd0c1e501bd525640a434433d364a5fd6dde2
/sig api-machinery instrumentation |
@logicalhan: The label(s) In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/priority important-soon |
4c6e724
to
194aed7
Compare
/retest |
For ref: #79609 |
|
||
requestErrorTotal = compbasemetrics.NewCounterVec( | ||
&compbasemetrics.CounterOpts{ | ||
Name: "apiserver_request_error_total", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pedantic: error should probably be pluralized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
Please take a look at the metric name as I would rather not change it after this is submitted.
194aed7
to
9ba8ab3
Compare
Will lgtm after gofmt. |
Change-Id: I12eb94f41ded20ed5a16332ada13a7b34f75de18
9ba8ab3
to
5e652fe
Compare
…ding documentation Change-Id: I47a9c7b10614afe85bb652fa61984f91848d6d65
/retest |
/lgtm |
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: lavalamp, logicalhan The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
If it is easy, I'd like to cherry-pick this back. This is a major defect in our metrics. |
@lavalamp |
@RainbowMango please feel free! I'd take this back as many versions as we support, 1.14, 1.15, 1.16, and 1.13 if it's still in the support window. Not sure how easily this will cherry pick to the older ones... |
@lavalamp @RainbowMango are there going to be backports on this for 1.15 and 1.14? |
…#83427-upstream-release-1.16 Automated cherry pick of #83427: Fix double counting issue for request metrics on timeout.
Oh, yes. I will try this as soon as possible. As mentioned by @logicalhan , we will probably need a manual PR to pick this fix back to 1.15 and 1.14. According to kubernetes version policy, I think 1.13 is no longer maintained, so it wouldn't happen on 1.13.
|
Fix double counting issue for request metrics on timeout.
Fix double counting issue for request metrics on timeout.
…83427-upstream-release-1.14 Manually cherry pick of #83427: Fix double counting issue for request metrics on timeout
…83427-upstream-release-1.16 Manually cherry pick of #83427: Fix double counting issue for request metrics on timeout
Currently we record request metrics during the normal request flow; we also manually invoke
Record
in the timeout handler to record timeouts. This means that we double count requests to our metrics whenever we timeout. To preserve the bits where we do encounter an error but to avoid double-counting our metrics, this PR adds a new metric to represent the running count of apiserver request errors and records occurrences of encountered errors to that new metrics.This PR also renames the
Record
function toRecordRequestError
to more accurately reflect the intended side-effect of the function call.What type of PR is this?
/kind bug
Which issue(s) this PR fixes:
Fixes #83424
Special notes for your reviewer:
Does this PR introduce a user-facing change?: