-
Notifications
You must be signed in to change notification settings - Fork 527
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Translate otel metrics to libbeat monitoring #15094
Conversation
This pull request does not have a backport label. Could you fix it @kruskall? 🙏
|
|
634302f
to
642a89b
Compare
This is ready for review again, unfortunately I'm afraid there's no easy way to split this up |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for extracting the other PR - this is much easier to review now.
equal, result := monitoringtest.CompareMonitoringInt(map[request.ResultID]int{ | ||
request.IDRequestCount: 2, | ||
request.IDResponseCount: 2, | ||
request.IDResponseErrorsCount: 1, | ||
request.IDResponseValidCount: 1, | ||
request.IDResponseErrorsTimeout: 1, // test data POST /intake/v2/events | ||
request.IDResponseValidAccepted: 1, // self-instrumentation | ||
}, intake.MonitoringMap) | ||
assert.True(t, equal, result) | ||
monitoringtest.ExpectContainOtelMetrics(t, reader, map[string]any{ | ||
"http.server." + string(request.IDRequestCount): 1, | ||
"http.server." + string(request.IDResponseCount): 1, | ||
"http.server." + string(request.IDResponseErrorsCount): 1, | ||
"http.server." + string(request.IDResponseErrorsTimeout): 1, // test data POST /intake/v2/events | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably worth adding a comment that the self-instrumentation requests are not counted in metrics.
This is a change in behaviour. I think it's probably what users would want, but would like to hear @simitt's opinion on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that we don't want to mingle the counters for user requests and self-instrumentation requests together, especially when not being able to encode a differentiator in the metadata. Seems like a bugfix to me.
internal/beater/otlp/grpc.go
Outdated
@@ -79,7 +54,17 @@ func RegisterGRPCServices( | |||
Semaphore: semaphore, | |||
RemapOTelMetrics: true, | |||
}) | |||
gRPCMonitoredConsumer.set(consumer) | |||
|
|||
// FIXME we should add an otel counter metric directly in the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this shouldn't be needed now that we propagate the meterprovider instead of reusing a global meter.
I don't follow. The issue here is that:
- the MeterProvider & Meters will survive across config reloads
- on each config reload we will register a new callback
- we never unregister those callbacks, so memory & CPU usage will increase with every reload
I also think adding metrics to apm-data is not as simple since we need to keep other systems in mind (that's a library).
What's difficult about that? We can add a MeterProvider to apm-data/input/otlp.ConsumerConfig, and if it's not set then we can either use the global or a noop provider.
The alternative to adding the metric to apm-data would be to find a way to unregister the callback when the server is stopped.
Co-authored-by: Andrew Wilkins <axwalk@gmail.com>
forgot to pus this
v.OnRegistryStart() | ||
defer v.OnRegistryFinished() | ||
for _, sm := range rm.ScopeMetrics { | ||
switch { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: why to use switch case here? do we expect to handle more cases in the future?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see above comment, it's for consistency
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aligning all of these monitoring functions to use if
statements also sounds fine, I was mostly addressing the inconsistency for the same kind of checks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only one minor question;
PR looks great and improves the readability of the metrics collection
v.OnRegistryStart() | ||
defer v.OnRegistryFinished() | ||
for _, sm := range rm.ScopeMetrics { | ||
switch { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aligning all of these monitoring functions to use if
statements also sounds fine, I was mostly addressing the inconsistency for the same kind of checks.
{IntakePath, builder.backendIntakeHandler(meterProvider)}, | ||
{OTLPTracesIntakePath, builder.otlpHandler(otlpHandlers.HandleTraces, "apm-server.otlp.http.traces.", meterProvider)}, | ||
{OTLPMetricsIntakePath, builder.otlpHandler(otlpHandlers.HandleMetrics, "apm-server.otlp.http.metrics.", meterProvider)}, | ||
{OTLPLogsIntakePath, builder.otlpHandler(otlpHandlers.HandleLogs, "apm-server.otlp.http.logs.", meterProvider)}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are you passing the metricsPrefix
as an argument to the otel routes vs. hardcode them inside the handler functions for non-otel?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point! I guess the otlp handler is shared so it must be parametrized while the intake path doesn't. We can change it so they are all consistent. I'll open a followup PR
equal, result := monitoringtest.CompareMonitoringInt(map[request.ResultID]int{ | ||
request.IDRequestCount: 2, | ||
request.IDResponseCount: 2, | ||
request.IDResponseErrorsCount: 1, | ||
request.IDResponseValidCount: 1, | ||
request.IDResponseErrorsTimeout: 1, // test data POST /intake/v2/events | ||
request.IDResponseValidAccepted: 1, // self-instrumentation | ||
}, intake.MonitoringMap) | ||
assert.True(t, equal, result) | ||
monitoringtest.ExpectContainOtelMetrics(t, reader, map[string]any{ | ||
"http.server." + string(request.IDRequestCount): 1, | ||
"http.server." + string(request.IDResponseCount): 1, | ||
"http.server." + string(request.IDResponseErrorsCount): 1, | ||
"http.server." + string(request.IDResponseErrorsTimeout): 1, // test data POST /intake/v2/events | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that we don't want to mingle the counters for user requests and self-instrumentation requests together, especially when not being able to encode a differentiator in the metadata. Seems like a bugfix to me.
* Translate otel metrics to libbeat monitoring * demo: send metrics directly and add another reader * Revert "demo: send metrics directly and add another reader" This reverts commit 166a717. * lint: fix linter issues * lint: fix linter issues * feat: refactor code to propagate meterprovider and fix tests * lint: fix linter issues * refactor: remove unused files * refactor: remove more global meters * refactor: cleanup more unused code * refactor: remove more unused code * test: account for agentcfg metric in test * test: account for agentcfg metric in test * fix: pass meter provider in main func * Fix LSM metrics tests * test: fix remaining test * lint: fix linter issues * fix: update docappender metrics name * test: update systemtest metric assertions * fix: update grpc interceptor meter path metrics were not being exported because the meter was not recognized as apm-server meter * fix: add back output.type=elasticsearch * test: upate remaining systemtest * test: remove debug print * fix: use correct outputRegistry variable fix panic * fix: remove panic on err * fix: propagate meter provider to grpc services * lint: add meterprovider field docs * lint: fix comment typo * feat: pass apmotel gatherer too * refactor: use switch pattern consistently when mapping metrics * Update beater.go * Update beat.go * Update beater.go * fix: solve compile errors and systemtest fix * refactor: reduce diff noise * lint: fix linter issues * lint: fix linter issues * Update x-pack/apm-server/main.go Co-authored-by: Andrew Wilkins <axwalk@gmail.com> * test: use legacy metrics for validating grpc tests * fix: unregister callback if available forgot to pus this --------- Co-authored-by: Andrew Wilkins <axw@elastic.co> Co-authored-by: Andrew Wilkins <axwalk@gmail.com> (cherry picked from commit 378b60c) # Conflicts: # internal/beatcmd/beat.go # internal/beater/beater.go # internal/beater/server.go # internal/beater/server_test.go
@Mergifyio backport 8.x |
✅ Backports have been created
|
…15440) * Translate otel metrics to libbeat monitoring (#15094) * Translate otel metrics to libbeat monitoring * demo: send metrics directly and add another reader * Revert "demo: send metrics directly and add another reader" This reverts commit 166a717. * lint: fix linter issues * lint: fix linter issues * feat: refactor code to propagate meterprovider and fix tests * lint: fix linter issues * refactor: remove unused files * refactor: remove more global meters * refactor: cleanup more unused code * refactor: remove more unused code * test: account for agentcfg metric in test * test: account for agentcfg metric in test * fix: pass meter provider in main func * Fix LSM metrics tests * test: fix remaining test * lint: fix linter issues * fix: update docappender metrics name * test: update systemtest metric assertions * fix: update grpc interceptor meter path metrics were not being exported because the meter was not recognized as apm-server meter * fix: add back output.type=elasticsearch * test: upate remaining systemtest * test: remove debug print * fix: use correct outputRegistry variable fix panic * fix: remove panic on err * fix: propagate meter provider to grpc services * lint: add meterprovider field docs * lint: fix comment typo * feat: pass apmotel gatherer too * refactor: use switch pattern consistently when mapping metrics * Update beater.go * Update beat.go * Update beater.go * fix: solve compile errors and systemtest fix * refactor: reduce diff noise * lint: fix linter issues * lint: fix linter issues * Update x-pack/apm-server/main.go Co-authored-by: Andrew Wilkins <axwalk@gmail.com> * test: use legacy metrics for validating grpc tests * fix: unregister callback if available forgot to pus this --------- Co-authored-by: Andrew Wilkins <axw@elastic.co> Co-authored-by: Andrew Wilkins <axwalk@gmail.com> (cherry picked from commit 378b60c) # Conflicts: # internal/beatcmd/beat.go # internal/beater/beater.go # internal/beater/server.go # internal/beater/server_test.go * fix: resolve conflicts --------- Co-authored-by: kruskall <99559985+kruskall@users.noreply.github.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
Motivation/summary
use otel api to record metrics and export them to beats monitoring
Checklist
For functional changes, consider:
How to test these changes
Related issues
Related to #14488