Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(routing/http/server): expose prometheus metrics #718

Merged
merged 7 commits into from
Nov 19, 2024
Merged

Conversation

2color
Copy link
Member

@2color 2color commented Nov 13, 2024

What's in this PR

  • feat: add instrumentation with handler label to delegated routing endpoints
  • feat: allow passing in custom prometheus registry

Why

Until now, we relied on instrumentation in consumers of the delegated routing server, e.g. in someguy. The problem is that you cannot get endpoint/handler specific metrics.

The duration buckets were chosen based on probelab data and production someguy metrics. In many cases, requests take over 10 seconds.

@2color 2color requested a review from a team as a code owner November 13, 2024 09:49
Copy link

codecov bot commented Nov 13, 2024

Codecov Report

Attention: Patch coverage is 82.60870% with 4 lines in your changes missing coverage. Please review.

Project coverage is 60.40%. Comparing base (625aadd) to head (786e657).
Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
routing/http/server/server.go 82.60% 4 Missing ⚠️

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #718      +/-   ##
==========================================
+ Coverage   60.38%   60.40%   +0.02%     
==========================================
  Files         243      243              
  Lines       31021    31039      +18     
==========================================
+ Hits        18731    18749      +18     
+ Misses      10626    10625       -1     
- Partials     1664     1665       +1     
Files with missing lines Coverage Δ
routing/http/server/server.go 74.22% <82.60%> (+0.14%) ⬆️

... and 10 files with indirect coverage changes


🚨 Try these New Features:

Copy link
Member

@lidel lidel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested with #679 in ipfs/someguy#87 and lgtm.

Preview of metrics produced by this PR:

# HELP delegated_routing_server_http_request_duration_seconds The latency of the HTTP requests.
# TYPE delegated_routing_server_http_request_duration_seconds histogram
delegated_routing_server_http_request_duration_seconds_bucket{code="200",handler="/routing/v1/peers/{peer-id}",method="GET",service="",le="0.1"} 0
delegated_routing_server_http_request_duration_seconds_bucket{code="200",handler="/routing/v1/peers/{peer-id}",method="GET",service="",le="0.5"} 0
delegated_routing_server_http_request_duration_seconds_bucket{code="200",handler="/routing/v1/peers/{peer-id}",method="GET",service="",le="1"} 2
delegated_routing_server_http_request_duration_seconds_bucket{code="200",handler="/routing/v1/peers/{peer-id}",method="GET",service="",le="2"} 2
delegated_routing_server_http_request_duration_seconds_bucket{code="200",handler="/routing/v1/peers/{peer-id}",method="GET",service="",le="5"} 2
delegated_routing_server_http_request_duration_seconds_bucket{code="200",handler="/routing/v1/peers/{peer-id}",method="GET",service="",le="8"} 2
delegated_routing_server_http_request_duration_seconds_bucket{code="200",handler="/routing/v1/peers/{peer-id}",method="GET",service="",le="10"} 2
delegated_routing_server_http_request_duration_seconds_bucket{code="200",handler="/routing/v1/peers/{peer-id}",method="GET",service="",le="20"} 2
delegated_routing_server_http_request_duration_seconds_bucket{code="200",handler="/routing/v1/peers/{peer-id}",method="GET",service="",le="30"} 2
delegated_routing_server_http_request_duration_seconds_bucket{code="200",handler="/routing/v1/peers/{peer-id}",method="GET",service="",le="+Inf"} 2
delegated_routing_server_http_request_duration_seconds_sum{code="200",handler="/routing/v1/peers/{peer-id}",method="GET",service=""} 1.6826577409999999
delegated_routing_server_http_request_duration_seconds_count{code="200",handler="/routing/v1/peers/{peer-id}",method="GET",service=""} 2
delegated_routing_server_http_request_duration_seconds_bucket{code="200",handler="/routing/v1/providers/{cid}",method="GET",service="",le="0.1"} 1
delegated_routing_server_http_request_duration_seconds_bucket{code="200",handler="/routing/v1/providers/{cid}",method="GET",service="",le="0.5"} 1
delegated_routing_server_http_request_duration_seconds_bucket{code="200",handler="/routing/v1/providers/{cid}",method="GET",service="",le="1"} 1
delegated_routing_server_http_request_duration_seconds_bucket{code="200",handler="/routing/v1/providers/{cid}",method="GET",service="",le="2"} 1
delegated_routing_server_http_request_duration_seconds_bucket{code="200",handler="/routing/v1/providers/{cid}",method="GET",service="",le="5"} 2
delegated_routing_server_http_request_duration_seconds_bucket{code="200",handler="/routing/v1/providers/{cid}",method="GET",service="",le="8"} 2
delegated_routing_server_http_request_duration_seconds_bucket{code="200",handler="/routing/v1/providers/{cid}",method="GET",service="",le="10"} 2
delegated_routing_server_http_request_duration_seconds_bucket{code="200",handler="/routing/v1/providers/{cid}",method="GET",service="",le="20"} 2
delegated_routing_server_http_request_duration_seconds_bucket{code="200",handler="/routing/v1/providers/{cid}",method="GET",service="",le="30"} 2
delegated_routing_server_http_request_duration_seconds_bucket{code="200",handler="/routing/v1/providers/{cid}",method="GET",service="",le="+Inf"} 2
delegated_routing_server_http_request_duration_seconds_sum{code="200",handler="/routing/v1/providers/{cid}",method="GET",service=""} 3.079128329
delegated_routing_server_http_request_duration_seconds_count{code="200",handler="/routing/v1/providers/{cid}",method="GET",service=""} 2
# HELP delegated_routing_server_http_requests_inflight The number of inflight requests being handled at the same time.
# TYPE delegated_routing_server_http_requests_inflight gauge
delegated_routing_server_http_requests_inflight{handler="/routing/v1/peers/{peer-id}",service=""} 0
delegated_routing_server_http_requests_inflight{handler="/routing/v1/providers/{cid}",service=""} 0
# HELP delegated_routing_server_http_response_size_bytes The size of the HTTP responses.
# TYPE delegated_routing_server_http_response_size_bytes histogram
delegated_routing_server_http_response_size_bytes_bucket{code="200",handler="/routing/v1/peers/{peer-id}",method="GET",service="",le="100"} 0
delegated_routing_server_http_response_size_bytes_bucket{code="200",handler="/routing/v1/peers/{peer-id}",method="GET",service="",le="1000"} 2
delegated_routing_server_http_response_size_bytes_bucket{code="200",handler="/routing/v1/peers/{peer-id}",method="GET",service="",le="10000"} 2
delegated_routing_server_http_response_size_bytes_bucket{code="200",handler="/routing/v1/peers/{peer-id}",method="GET",service="",le="100000"} 2
delegated_routing_server_http_response_size_bytes_bucket{code="200",handler="/routing/v1/peers/{peer-id}",method="GET",service="",le="1e+06"} 2
delegated_routing_server_http_response_size_bytes_bucket{code="200",handler="/routing/v1/peers/{peer-id}",method="GET",service="",le="1e+07"} 2
delegated_routing_server_http_response_size_bytes_bucket{code="200",handler="/routing/v1/peers/{peer-id}",method="GET",service="",le="1e+08"} 2
delegated_routing_server_http_response_size_bytes_bucket{code="200",handler="/routing/v1/peers/{peer-id}",method="GET",service="",le="1e+09"} 2
delegated_routing_server_http_response_size_bytes_bucket{code="200",handler="/routing/v1/peers/{peer-id}",method="GET",service="",le="+Inf"} 2
delegated_routing_server_http_response_size_bytes_sum{code="200",handler="/routing/v1/peers/{peer-id}",method="GET",service=""} 398
delegated_routing_server_http_response_size_bytes_count{code="200",handler="/routing/v1/peers/{peer-id}",method="GET",service=""} 2
delegated_routing_server_http_response_size_bytes_bucket{code="200",handler="/routing/v1/providers/{cid}",method="GET",service="",le="100"} 0
delegated_routing_server_http_response_size_bytes_bucket{code="200",handler="/routing/v1/providers/{cid}",method="GET",service="",le="1000"} 0
delegated_routing_server_http_response_size_bytes_bucket{code="200",handler="/routing/v1/providers/{cid}",method="GET",service="",le="10000"} 0
delegated_routing_server_http_response_size_bytes_bucket{code="200",handler="/routing/v1/providers/{cid}",method="GET",service="",le="100000"} 2
delegated_routing_server_http_response_size_bytes_bucket{code="200",handler="/routing/v1/providers/{cid}",method="GET",service="",le="1e+06"} 2
delegated_routing_server_http_response_size_bytes_bucket{code="200",handler="/routing/v1/providers/{cid}",method="GET",service="",le="1e+07"} 2
delegated_routing_server_http_response_size_bytes_bucket{code="200",handler="/routing/v1/providers/{cid}",method="GET",service="",le="1e+08"} 2
delegated_routing_server_http_response_size_bytes_bucket{code="200",handler="/routing/v1/providers/{cid}",method="GET",service="",le="1e+09"} 2
delegated_routing_server_http_response_size_bytes_bucket{code="200",handler="/routing/v1/providers/{cid}",method="GET",service="",le="+Inf"} 2
delegated_routing_server_http_response_size_bytes_sum{code="200",handler="/routing/v1/providers/{cid}",method="GET",service=""} 42133
delegated_routing_server_http_response_size_bytes_count{code="200",handler="/routing/v1/providers/{cid}",method="GET",service=""} 2

I'm going to merge this to allow us to bubble this up to someguy and delegated-ipfs.dev

routing/http/server/server.go Outdated Show resolved Hide resolved
@@ -133,12 +143,28 @@ func Handler(svc ContentRouter, opts ...Option) http.Handler {
opt(server)
}

if server.promRegistry == nil {
server.promRegistry = prometheus.NewRegistry()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iiuc this will disable metric for users by default, requiring opt-in via WithPrometheusRegistry.

I noticed we already call NewRegistry() in other places, and changing this requires a slight refactor of tests, so let's keep this as-is and cleanup in #722 without blocking this PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. Creating a new registry without access to the registry is an anti-pattern because you can't expose those metrics.

We should definitely default to the default registry.

@lidel lidel changed the title feat: instrument delegated routing server feat(routing/http/server): expose prometheus metrics (#718) Nov 19, 2024
@lidel lidel enabled auto-merge (squash) November 19, 2024 00:06
@lidel lidel disabled auto-merge November 19, 2024 00:06
@lidel lidel changed the title feat(routing/http/server): expose prometheus metrics (#718) feat(routing/http/server): expose prometheus metrics Nov 19, 2024
@lidel lidel merged commit 1bf1488 into main Nov 19, 2024
17 checks passed
@lidel lidel deleted the add-instrumentation branch November 19, 2024 00:31
lidel added a commit to ipfs/someguy that referenced this pull request Nov 19, 2024
2color added a commit to ipfs/someguy that referenced this pull request Nov 19, 2024
…request timeouts (#87)

* fix: larger duration buckets for better visibility

* feat: log accept header

* fix: move instrumentation to boxo

* feat: add tracing with auth token

* feat: add 30 second request timeout

* chore: remove replace directive

* chore: add missing funcSampler

* chore: remove request timeout

this isn't working too well. We need to look more deeply into this

* chore: update changelog

* chore: go mod tidy

* chore: go-libp2p-kad-dht v0.28.1

* chore: latest boxo#720

* chore: mod tidy

* chore: boxo main

with ipfs/boxo#720 and ipfs/boxo#718

* Apply suggestions from code review

Co-authored-by: Marcin Rataj <lidel@lidel.org>

* fix: typo

---------

Co-authored-by: Daniel N <2color@users.noreply.github.com>
Co-authored-by: Marcin Rataj <lidel@lidel.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants