Translate otel metrics to libbeat monitoring #15094

kruskall · 2025-01-01T00:49:53Z

Motivation/summary

use otel api to record metrics and export them to beats monitoring

Checklist

Update CHANGELOG.asciidoc
Documentation has been updated

For functional changes, consider:

Is it observable through the addition of either logging or metrics?
Is its use being published in telemetry to enable product improvement?
Have system tests been added to avoid regression?

How to test these changes

run apm-server
send data
go to index management and validate the monitoring index is there and monitoring data is inside it

Related issues

Related to #14488

This reverts commit 166a717.

mergify · 2025-01-01T00:50:40Z

This pull request does not have a backport label. Could you fix it @kruskall? 🙏
To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

backport-7.17 is the label to automatically backport to the 7.17 branch.
backport-8./d is the label to automatically backport to the 8./d branch. /d is the digit.
backport-8.x is the label to automatically backport to the 8.x branch.

mergify · 2025-01-01T00:50:41Z

backport-8.x has been added to help with the transition to the new branch 8.x.
If you don't need it please use backport-skip label.

internal/beater/server_test.go

x-pack/apm-server/main_test.go

kruskall · 2025-01-24T19:28:22Z

This is ready for review again, unfortunately I'm afraid there's no easy way to split this up

axw

Thanks for extracting the other PR - this is much easier to review now.

x-pack/apm-server/main.go

axw · 2025-01-26T08:31:26Z

internal/beater/server_test.go

-	equal, result := monitoringtest.CompareMonitoringInt(map[request.ResultID]int{
-		request.IDRequestCount:          2,
-		request.IDResponseCount:         2,
-		request.IDResponseErrorsCount:   1,
-		request.IDResponseValidCount:    1,
-		request.IDResponseErrorsTimeout: 1, // test data POST /intake/v2/events
-		request.IDResponseValidAccepted: 1, // self-instrumentation
-	}, intake.MonitoringMap)
-	assert.True(t, equal, result)
+	monitoringtest.ExpectContainOtelMetrics(t, reader, map[string]any{
+		"http.server." + string(request.IDRequestCount):          1,
+		"http.server." + string(request.IDResponseCount):         1,
+		"http.server." + string(request.IDResponseErrorsCount):   1,
+		"http.server." + string(request.IDResponseErrorsTimeout): 1, // test data POST /intake/v2/events
+	})


Probably worth adding a comment that the self-instrumentation requests are not counted in metrics.

This is a change in behaviour. I think it's probably what users would want, but would like to hear @simitt's opinion on this.

I agree that we don't want to mingle the counters for user requests and self-instrumentation requests together, especially when not being able to encode a differentiator in the metadata. Seems like a bugfix to me.

internal/beater/otlp/grpc_test.go

axw · 2025-01-26T08:40:01Z

internal/beater/otlp/grpc.go

@@ -79,7 +54,17 @@ func RegisterGRPCServices(
 		Semaphore:        semaphore,
 		RemapOTelMetrics: true,
 	})
-	gRPCMonitoredConsumer.set(consumer)
+
+	// FIXME we should add an otel counter metric directly in the


this shouldn't be needed now that we propagate the meterprovider instead of reusing a global meter.

I don't follow. The issue here is that:

the MeterProvider & Meters will survive across config reloads

on each config reload we will register a new callback

we never unregister those callbacks, so memory & CPU usage will increase with every reload

I also think adding metrics to apm-data is not as simple since we need to keep other systems in mind (that's a library).

What's difficult about that? We can add a MeterProvider to apm-data/input/otlp.ConsumerConfig, and if it's not set then we can either use the global or a noop provider.

The alternative to adding the metric to apm-data would be to find a way to unregister the callback when the server is stopped.

Co-authored-by: Andrew Wilkins <axwalk@gmail.com>

forgot to pus this

1pkg · 2025-01-28T01:54:40Z

internal/beatcmd/beat.go

+		v.OnRegistryStart()
+		defer v.OnRegistryFinished()
+		for _, sm := range rm.ScopeMetrics {
+			switch {


nit: why to use switch case here? do we expect to handle more cases in the future?

see above comment, it's for consistency

Aligning all of these monitoring functions to use if statements also sounds fine, I was mostly addressing the inconsistency for the same kind of checks.

internal/beater/api/intake/handler.go

internal/model/modelprocessor/eventcounter.go

axw

Thanks @kruskall. Changes LGTM, please wait for sign off from @simitt though

simitt

Only one minor question;
PR looks great and improves the readability of the metrics collection

simitt · 2025-01-28T15:05:20Z

internal/beatcmd/beat.go

+		v.OnRegistryStart()
+		defer v.OnRegistryFinished()
+		for _, sm := range rm.ScopeMetrics {
+			switch {


Aligning all of these monitoring functions to use if statements also sounds fine, I was mostly addressing the inconsistency for the same kind of checks.

simitt · 2025-01-28T15:12:47Z

internal/beater/api/mux.go

+		{IntakePath, builder.backendIntakeHandler(meterProvider)},
+		{OTLPTracesIntakePath, builder.otlpHandler(otlpHandlers.HandleTraces, "apm-server.otlp.http.traces.", meterProvider)},
+		{OTLPMetricsIntakePath, builder.otlpHandler(otlpHandlers.HandleMetrics, "apm-server.otlp.http.metrics.", meterProvider)},
+		{OTLPLogsIntakePath, builder.otlpHandler(otlpHandlers.HandleLogs, "apm-server.otlp.http.logs.", meterProvider)},


Why are you passing the metricsPrefix as an argument to the otel routes vs. hardcode them inside the handler functions for non-otel?

Good point! I guess the otlp handler is shared so it must be parametrized while the intake path doesn't. We can change it so they are all consistent. I'll open a followup PR

simitt · 2025-01-28T15:31:45Z

internal/beater/server_test.go

-	equal, result := monitoringtest.CompareMonitoringInt(map[request.ResultID]int{
-		request.IDRequestCount:          2,
-		request.IDResponseCount:         2,
-		request.IDResponseErrorsCount:   1,
-		request.IDResponseValidCount:    1,
-		request.IDResponseErrorsTimeout: 1, // test data POST /intake/v2/events
-		request.IDResponseValidAccepted: 1, // self-instrumentation
-	}, intake.MonitoringMap)
-	assert.True(t, equal, result)
+	monitoringtest.ExpectContainOtelMetrics(t, reader, map[string]any{
+		"http.server." + string(request.IDRequestCount):          1,
+		"http.server." + string(request.IDResponseCount):         1,
+		"http.server." + string(request.IDResponseErrorsCount):   1,
+		"http.server." + string(request.IDResponseErrorsTimeout): 1, // test data POST /intake/v2/events
+	})


I agree that we don't want to mingle the counters for user requests and self-instrumentation requests together, especially when not being able to encode a differentiator in the metadata. Seems like a bugfix to me.

internal/model/modelprocessor/eventcounter.go

* Translate otel metrics to libbeat monitoring * demo: send metrics directly and add another reader * Revert "demo: send metrics directly and add another reader" This reverts commit 166a717. * lint: fix linter issues * lint: fix linter issues * feat: refactor code to propagate meterprovider and fix tests * lint: fix linter issues * refactor: remove unused files * refactor: remove more global meters * refactor: cleanup more unused code * refactor: remove more unused code * test: account for agentcfg metric in test * test: account for agentcfg metric in test * fix: pass meter provider in main func * Fix LSM metrics tests * test: fix remaining test * lint: fix linter issues * fix: update docappender metrics name * test: update systemtest metric assertions * fix: update grpc interceptor meter path metrics were not being exported because the meter was not recognized as apm-server meter * fix: add back output.type=elasticsearch * test: upate remaining systemtest * test: remove debug print * fix: use correct outputRegistry variable fix panic * fix: remove panic on err * fix: propagate meter provider to grpc services * lint: add meterprovider field docs * lint: fix comment typo * feat: pass apmotel gatherer too * refactor: use switch pattern consistently when mapping metrics * Update beater.go * Update beat.go * Update beater.go * fix: solve compile errors and systemtest fix * refactor: reduce diff noise * lint: fix linter issues * lint: fix linter issues * Update x-pack/apm-server/main.go Co-authored-by: Andrew Wilkins <axwalk@gmail.com> * test: use legacy metrics for validating grpc tests * fix: unregister callback if available forgot to pus this --------- Co-authored-by: Andrew Wilkins <axw@elastic.co> Co-authored-by: Andrew Wilkins <axwalk@gmail.com> (cherry picked from commit 378b60c) # Conflicts: # internal/beatcmd/beat.go # internal/beater/beater.go # internal/beater/server.go # internal/beater/server_test.go

kruskall · 2025-01-31T14:33:56Z

@Mergifyio backport 8.x

mergify · 2025-01-31T14:34:05Z

backport 8.x

✅ Backports have been created

#15440 [8.x] Translate otel metrics to libbeat monitoring (backport #15094) has been created for branch 8.x

…15440) * Translate otel metrics to libbeat monitoring (#15094) * Translate otel metrics to libbeat monitoring * demo: send metrics directly and add another reader * Revert "demo: send metrics directly and add another reader" This reverts commit 166a717. * lint: fix linter issues * lint: fix linter issues * feat: refactor code to propagate meterprovider and fix tests * lint: fix linter issues * refactor: remove unused files * refactor: remove more global meters * refactor: cleanup more unused code * refactor: remove more unused code * test: account for agentcfg metric in test * test: account for agentcfg metric in test * fix: pass meter provider in main func * Fix LSM metrics tests * test: fix remaining test * lint: fix linter issues * fix: update docappender metrics name * test: update systemtest metric assertions * fix: update grpc interceptor meter path metrics were not being exported because the meter was not recognized as apm-server meter * fix: add back output.type=elasticsearch * test: upate remaining systemtest * test: remove debug print * fix: use correct outputRegistry variable fix panic * fix: remove panic on err * fix: propagate meter provider to grpc services * lint: add meterprovider field docs * lint: fix comment typo * feat: pass apmotel gatherer too * refactor: use switch pattern consistently when mapping metrics * Update beater.go * Update beat.go * Update beater.go * fix: solve compile errors and systemtest fix * refactor: reduce diff noise * lint: fix linter issues * lint: fix linter issues * Update x-pack/apm-server/main.go Co-authored-by: Andrew Wilkins <axwalk@gmail.com> * test: use legacy metrics for validating grpc tests * fix: unregister callback if available forgot to pus this --------- Co-authored-by: Andrew Wilkins <axw@elastic.co> Co-authored-by: Andrew Wilkins <axwalk@gmail.com> (cherry picked from commit 378b60c) # Conflicts: # internal/beatcmd/beat.go # internal/beater/beater.go # internal/beater/server.go # internal/beater/server_test.go * fix: resolve conflicts --------- Co-authored-by: kruskall <99559985+kruskall@users.noreply.github.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

axw and others added 4 commits December 5, 2024 14:21

Translate otel metrics to libbeat monitoring

7bcfa46

Merge remote-tracking branch 'upstream/main' into otel-adapter

0491b4c

demo: send metrics directly and add another reader

166a717

Revert "demo: send metrics directly and add another reader"

11b11d8

This reverts commit 166a717.

kruskall marked this pull request as ready for review January 1, 2025 00:50

kruskall requested a review from a team as a code owner January 1, 2025 00:50

mergify bot added the backport-8.x Automated backport to the 8.x branch with mergify label Jan 1, 2025

kruskall mentioned this pull request Jan 1, 2025

Investigate replacing libbeat monitoring usage with OpenTelemetry instrumentation #14488

Closed

kruskall added 3 commits January 1, 2025 01:57

lint: fix linter issues

128b5b9

Merge branch 'main' into otel-adapter

0adb6ab

lint: fix linter issues

642a89b

kruskall force-pushed the otel-adapter branch from 634302f to 642a89b Compare January 15, 2025 16:16

kruskall added 11 commits January 21, 2025 18:11

feat: refactor code to propagate meterprovider and fix tests

15d6652

Merge remote-tracking branch 'upstream/main' into otel-adapter

bfedda8

lint: fix linter issues

cc914d0

refactor: remove unused files

4e9110d

refactor: remove more global meters

63a49cc

refactor: cleanup more unused code

64fe558

Merge branch 'main' into otel-adapter

df2e35c

refactor: remove more unused code

3931dab

test: account for agentcfg metric in test

750668f

test: account for agentcfg metric in test

fe73501

fix: pass meter provider in main func

ee6fc70

axw reviewed Jan 22, 2025

View reviewed changes

internal/beater/server_test.go Outdated Show resolved Hide resolved

axw reviewed Jan 22, 2025

View reviewed changes

x-pack/apm-server/main_test.go Outdated Show resolved Hide resolved

axw reviewed Jan 22, 2025

View reviewed changes

x-pack/apm-server/main_test.go Outdated Show resolved Hide resolved

axw and others added 2 commits January 22, 2025 15:36

Fix LSM metrics tests

8d8b066

test: fix remaining test

9821bcc

kruskall added 3 commits January 24, 2025 20:18

refactor: reduce diff noise

ea2260f

lint: fix linter issues

83bc51b

lint: fix linter issues

929f011

axw reviewed Jan 26, 2025

View reviewed changes

kruskall and others added 2 commits January 27, 2025 11:09

Update x-pack/apm-server/main.go

1f2322b

Co-authored-by: Andrew Wilkins <axwalk@gmail.com>

test: use legacy metrics for validating grpc tests

06080ca

kruskall requested review from axw and a team January 27, 2025 10:52

kruskall mentioned this pull request Jan 27, 2025

monitoring: Agentcfg monitoring metric names contain dot #13625

Closed

fix: unregister callback if available

45f84a7

forgot to pus this

1pkg reviewed Jan 28, 2025

View reviewed changes

internal/beater/api/intake/handler.go Show resolved Hide resolved

1pkg reviewed Jan 28, 2025

View reviewed changes

internal/model/modelprocessor/eventcounter.go Show resolved Hide resolved

axw reviewed Jan 28, 2025

View reviewed changes

simitt approved these changes Jan 28, 2025

View reviewed changes

Merge branch 'main' into otel-adapter

c95e71d

kruskall enabled auto-merge (squash) January 28, 2025 18:38

kruskall merged commit 378b60c into elastic:main Jan 28, 2025
12 checks passed

kruskall deleted the otel-adapter branch January 28, 2025 18:47

mergify bot mentioned this pull request Jan 28, 2025

[8.x] Translate otel metrics to libbeat monitoring (backport #15094) #15440

Merged

2 tasks

carsonip mentioned this pull request Jan 29, 2025

[8.x] TBS: Replace badger with pebble (backport #15235) #15452

Closed

12 tasks

endorama self-assigned this Feb 11, 2025

endorama mentioned this pull request Feb 11, 2025

9.0 test plan #15569

Open

7 tasks

endorama added test-plan test-plan-ok labels Feb 11, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Translate otel metrics to libbeat monitoring #15094

Translate otel metrics to libbeat monitoring #15094

kruskall commented Jan 1, 2025

mergify bot commented Jan 1, 2025

mergify bot commented Jan 1, 2025

kruskall commented Jan 24, 2025

axw left a comment

axw Jan 26, 2025

simitt Jan 28, 2025

axw Jan 26, 2025

1pkg Jan 28, 2025

kruskall Jan 28, 2025

simitt Jan 28, 2025

axw left a comment

simitt left a comment

simitt Jan 28, 2025

simitt Jan 28, 2025

kruskall Jan 28, 2025

simitt Jan 28, 2025

kruskall commented Jan 31, 2025

mergify bot commented Jan 31, 2025

Translate otel metrics to libbeat monitoring #15094

Translate otel metrics to libbeat monitoring #15094

Conversation

kruskall commented Jan 1, 2025

Motivation/summary

Checklist

How to test these changes

Related issues

mergify bot commented Jan 1, 2025

mergify bot commented Jan 1, 2025

kruskall commented Jan 24, 2025

axw left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

axw left a comment

Choose a reason for hiding this comment

simitt left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kruskall commented Jan 31, 2025

mergify bot commented Jan 31, 2025

✅ Backports have been created