-
Notifications
You must be signed in to change notification settings - Fork 106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(routing/http/server): expose prometheus metrics #718
Conversation
Codecov ReportAttention: Patch coverage is
@@ Coverage Diff @@
## main #718 +/- ##
==========================================
+ Coverage 60.38% 60.40% +0.02%
==========================================
Files 243 243
Lines 31021 31039 +18
==========================================
+ Hits 18731 18749 +18
+ Misses 10626 10625 -1
- Partials 1664 1665 +1
... and 10 files with indirect coverage changes 🚨 Try these New Features:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested with #679 in ipfs/someguy#87 and lgtm.
Preview of metrics produced by this PR:
# HELP delegated_routing_server_http_request_duration_seconds The latency of the HTTP requests.
# TYPE delegated_routing_server_http_request_duration_seconds histogram
delegated_routing_server_http_request_duration_seconds_bucket{code="200",handler="/routing/v1/peers/{peer-id}",method="GET",service="",le="0.1"} 0
delegated_routing_server_http_request_duration_seconds_bucket{code="200",handler="/routing/v1/peers/{peer-id}",method="GET",service="",le="0.5"} 0
delegated_routing_server_http_request_duration_seconds_bucket{code="200",handler="/routing/v1/peers/{peer-id}",method="GET",service="",le="1"} 2
delegated_routing_server_http_request_duration_seconds_bucket{code="200",handler="/routing/v1/peers/{peer-id}",method="GET",service="",le="2"} 2
delegated_routing_server_http_request_duration_seconds_bucket{code="200",handler="/routing/v1/peers/{peer-id}",method="GET",service="",le="5"} 2
delegated_routing_server_http_request_duration_seconds_bucket{code="200",handler="/routing/v1/peers/{peer-id}",method="GET",service="",le="8"} 2
delegated_routing_server_http_request_duration_seconds_bucket{code="200",handler="/routing/v1/peers/{peer-id}",method="GET",service="",le="10"} 2
delegated_routing_server_http_request_duration_seconds_bucket{code="200",handler="/routing/v1/peers/{peer-id}",method="GET",service="",le="20"} 2
delegated_routing_server_http_request_duration_seconds_bucket{code="200",handler="/routing/v1/peers/{peer-id}",method="GET",service="",le="30"} 2
delegated_routing_server_http_request_duration_seconds_bucket{code="200",handler="/routing/v1/peers/{peer-id}",method="GET",service="",le="+Inf"} 2
delegated_routing_server_http_request_duration_seconds_sum{code="200",handler="/routing/v1/peers/{peer-id}",method="GET",service=""} 1.6826577409999999
delegated_routing_server_http_request_duration_seconds_count{code="200",handler="/routing/v1/peers/{peer-id}",method="GET",service=""} 2
delegated_routing_server_http_request_duration_seconds_bucket{code="200",handler="/routing/v1/providers/{cid}",method="GET",service="",le="0.1"} 1
delegated_routing_server_http_request_duration_seconds_bucket{code="200",handler="/routing/v1/providers/{cid}",method="GET",service="",le="0.5"} 1
delegated_routing_server_http_request_duration_seconds_bucket{code="200",handler="/routing/v1/providers/{cid}",method="GET",service="",le="1"} 1
delegated_routing_server_http_request_duration_seconds_bucket{code="200",handler="/routing/v1/providers/{cid}",method="GET",service="",le="2"} 1
delegated_routing_server_http_request_duration_seconds_bucket{code="200",handler="/routing/v1/providers/{cid}",method="GET",service="",le="5"} 2
delegated_routing_server_http_request_duration_seconds_bucket{code="200",handler="/routing/v1/providers/{cid}",method="GET",service="",le="8"} 2
delegated_routing_server_http_request_duration_seconds_bucket{code="200",handler="/routing/v1/providers/{cid}",method="GET",service="",le="10"} 2
delegated_routing_server_http_request_duration_seconds_bucket{code="200",handler="/routing/v1/providers/{cid}",method="GET",service="",le="20"} 2
delegated_routing_server_http_request_duration_seconds_bucket{code="200",handler="/routing/v1/providers/{cid}",method="GET",service="",le="30"} 2
delegated_routing_server_http_request_duration_seconds_bucket{code="200",handler="/routing/v1/providers/{cid}",method="GET",service="",le="+Inf"} 2
delegated_routing_server_http_request_duration_seconds_sum{code="200",handler="/routing/v1/providers/{cid}",method="GET",service=""} 3.079128329
delegated_routing_server_http_request_duration_seconds_count{code="200",handler="/routing/v1/providers/{cid}",method="GET",service=""} 2
# HELP delegated_routing_server_http_requests_inflight The number of inflight requests being handled at the same time.
# TYPE delegated_routing_server_http_requests_inflight gauge
delegated_routing_server_http_requests_inflight{handler="/routing/v1/peers/{peer-id}",service=""} 0
delegated_routing_server_http_requests_inflight{handler="/routing/v1/providers/{cid}",service=""} 0
# HELP delegated_routing_server_http_response_size_bytes The size of the HTTP responses.
# TYPE delegated_routing_server_http_response_size_bytes histogram
delegated_routing_server_http_response_size_bytes_bucket{code="200",handler="/routing/v1/peers/{peer-id}",method="GET",service="",le="100"} 0
delegated_routing_server_http_response_size_bytes_bucket{code="200",handler="/routing/v1/peers/{peer-id}",method="GET",service="",le="1000"} 2
delegated_routing_server_http_response_size_bytes_bucket{code="200",handler="/routing/v1/peers/{peer-id}",method="GET",service="",le="10000"} 2
delegated_routing_server_http_response_size_bytes_bucket{code="200",handler="/routing/v1/peers/{peer-id}",method="GET",service="",le="100000"} 2
delegated_routing_server_http_response_size_bytes_bucket{code="200",handler="/routing/v1/peers/{peer-id}",method="GET",service="",le="1e+06"} 2
delegated_routing_server_http_response_size_bytes_bucket{code="200",handler="/routing/v1/peers/{peer-id}",method="GET",service="",le="1e+07"} 2
delegated_routing_server_http_response_size_bytes_bucket{code="200",handler="/routing/v1/peers/{peer-id}",method="GET",service="",le="1e+08"} 2
delegated_routing_server_http_response_size_bytes_bucket{code="200",handler="/routing/v1/peers/{peer-id}",method="GET",service="",le="1e+09"} 2
delegated_routing_server_http_response_size_bytes_bucket{code="200",handler="/routing/v1/peers/{peer-id}",method="GET",service="",le="+Inf"} 2
delegated_routing_server_http_response_size_bytes_sum{code="200",handler="/routing/v1/peers/{peer-id}",method="GET",service=""} 398
delegated_routing_server_http_response_size_bytes_count{code="200",handler="/routing/v1/peers/{peer-id}",method="GET",service=""} 2
delegated_routing_server_http_response_size_bytes_bucket{code="200",handler="/routing/v1/providers/{cid}",method="GET",service="",le="100"} 0
delegated_routing_server_http_response_size_bytes_bucket{code="200",handler="/routing/v1/providers/{cid}",method="GET",service="",le="1000"} 0
delegated_routing_server_http_response_size_bytes_bucket{code="200",handler="/routing/v1/providers/{cid}",method="GET",service="",le="10000"} 0
delegated_routing_server_http_response_size_bytes_bucket{code="200",handler="/routing/v1/providers/{cid}",method="GET",service="",le="100000"} 2
delegated_routing_server_http_response_size_bytes_bucket{code="200",handler="/routing/v1/providers/{cid}",method="GET",service="",le="1e+06"} 2
delegated_routing_server_http_response_size_bytes_bucket{code="200",handler="/routing/v1/providers/{cid}",method="GET",service="",le="1e+07"} 2
delegated_routing_server_http_response_size_bytes_bucket{code="200",handler="/routing/v1/providers/{cid}",method="GET",service="",le="1e+08"} 2
delegated_routing_server_http_response_size_bytes_bucket{code="200",handler="/routing/v1/providers/{cid}",method="GET",service="",le="1e+09"} 2
delegated_routing_server_http_response_size_bytes_bucket{code="200",handler="/routing/v1/providers/{cid}",method="GET",service="",le="+Inf"} 2
delegated_routing_server_http_response_size_bytes_sum{code="200",handler="/routing/v1/providers/{cid}",method="GET",service=""} 42133
delegated_routing_server_http_response_size_bytes_count{code="200",handler="/routing/v1/providers/{cid}",method="GET",service=""} 2
I'm going to merge this to allow us to bubble this up to someguy and delegated-ipfs.dev
@@ -133,12 +143,28 @@ func Handler(svc ContentRouter, opts ...Option) http.Handler { | |||
opt(server) | |||
} | |||
|
|||
if server.promRegistry == nil { | |||
server.promRegistry = prometheus.NewRegistry() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
iiuc this will disable metric for users by default, requiring opt-in via WithPrometheusRegistry
.
I noticed we already call NewRegistry()
in other places, and changing this requires a slight refactor of tests, so let's keep this as-is and cleanup in #722 without blocking this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. Creating a new registry without access to the registry is an anti-pattern because you can't expose those metrics.
We should definitely default to the default registry.
…request timeouts (#87) * fix: larger duration buckets for better visibility * feat: log accept header * fix: move instrumentation to boxo * feat: add tracing with auth token * feat: add 30 second request timeout * chore: remove replace directive * chore: add missing funcSampler * chore: remove request timeout this isn't working too well. We need to look more deeply into this * chore: update changelog * chore: go mod tidy * chore: go-libp2p-kad-dht v0.28.1 * chore: latest boxo#720 * chore: mod tidy * chore: boxo main with ipfs/boxo#720 and ipfs/boxo#718 * Apply suggestions from code review Co-authored-by: Marcin Rataj <lidel@lidel.org> * fix: typo --------- Co-authored-by: Daniel N <2color@users.noreply.github.com> Co-authored-by: Marcin Rataj <lidel@lidel.org>
What's in this PR
Why
Until now, we relied on instrumentation in consumers of the delegated routing server, e.g. in someguy. The problem is that you cannot get endpoint/handler specific metrics.
The duration buckets were chosen based on probelab data and production someguy metrics. In many cases, requests take over 10 seconds.