Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Health checks fail when running version 0.104.0 inside docker as a container #34133

Closed
vrbrahme opened this issue Jul 17, 2024 · 7 comments
Closed
Labels
extension/healthcheck Health Check Extension

Comments

@vrbrahme
Copy link

Component(s)

extension/healthcheck

What happened?

Description

This is something I am trying to figure out why it happens however I see it happening on my MacBook for some reason post version 0.104.0. I do not see any code changes made to the healthcheck extension post 0.103.0 until 0.105.0 however the issue persists.

When running the collector on the M1 Macbook using docker exposing the collector's healthcheck port externally, the health check seems to be abruptly disconnecting when using the default config.

Steps to Reproduce

Docker run command:

docker run --rm -p 4318:4318 -p 8888:8888 -v /tmp/collector-config.yaml:/etc/otelcol-contrib/config.yaml otel/opentelemetry-collector-contrib:0.104.0
curl -v http://localhost:13133/

Expected Result

* Host localhost:13133 was resolved.
* IPv6: ::1
* IPv4: 127.0.0.1
*   Trying [::1]:13133...
* Connected to localhost (::1) port 13133
> GET / HTTP/1.1
> Host: localhost:13133
> User-Agent: curl/8.6.0
> Accept: */*
>
< HTTP/1.1 200 OK
< Content-Type: application/json
< Date: Wed, 17 Jul 2024 15:17:23 GMT
< Content-Length: 96
<
* Connection #0 to host localhost left intact
{"status":"Server available","upSince":"2024-07-17T15:17:17.668073425Z","uptime":"6.289333337s"}%

Actual Result

* Host localhost:13133 was resolved.
* IPv6: ::1
* IPv4: 127.0.0.1
*   Trying [::1]:13133...
* Connected to localhost (::1) port 13133
> GET / HTTP/1.1
> Host: localhost:13133
> User-Agent: curl/8.6.0
> Accept: */*
>
* Recv failure: Connection reset by peer
* Closing connection
curl: (56) Recv failure: Connection reset by peer

This is only seen post 0.104.0 version and the expected result is achieved when running the collector on version 0.103.0
Git diff v0.103.0...v0.105.0

Collector version

0.104.0

Environment information

Environment

OS: Mac OS (Sonoma 14.5)
Docker version 26.1.1, build 4cf5afa

OpenTelemetry Collector configuration

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318
exporters:
  debug:
    verbosity: detailed

extensions:
  health_check:

service:
  extensions: [health_check]
  pipelines:
    traces:
      receivers: [otlp]
      exporters: [debug]
    metrics:
      receivers: [otlp]
      exporters: [debug]
    logs:
      receivers: [otlp]
      exporters: [debug]

Log output

2024-07-17T15:16:29.331Z	info	service@v0.104.0/service.go:115	Setting up own telemetry...
2024-07-17T15:16:29.332Z	info	service@v0.104.0/telemetry.go:96	Serving metrics	{"address": ":8888", "level": "Normal"}
2024-07-17T15:16:29.332Z	info	exporter@v0.104.0/exporter.go:280	Development component. May change in the future.	{"kind": "exporter", "data_type": "logs", "name": "debug"}
2024-07-17T15:16:29.332Z	info	exporter@v0.104.0/exporter.go:280	Development component. May change in the future.	{"kind": "exporter", "data_type": "metrics", "name": "debug"}
2024-07-17T15:16:29.332Z	info	exporter@v0.104.0/exporter.go:280	Development component. May change in the future.	{"kind": "exporter", "data_type": "traces", "name": "debug"}
2024-07-17T15:16:29.333Z	info	service@v0.104.0/service.go:193	Starting otelcol-contrib...	{"Version": "0.104.0", "NumCPU": 6}
2024-07-17T15:16:29.333Z	info	extensions/extensions.go:34	Starting extensions...
2024-07-17T15:16:29.333Z	info	extensions/extensions.go:37	Extension is starting...	{"kind": "extension", "name": "health_check"}
2024-07-17T15:16:29.333Z	info	healthcheckextension@v0.104.0/healthcheckextension.go:32	Starting health_check extension	{"kind": "extension", "name": "health_check", "config": {"Endpoint":"localhost:13133","TLSSetting":null,"CORS":null,"Auth":null,"MaxRequestBodySize":0,"IncludeMetadata":false,"ResponseHeaders":null,"CompressionAlgorithms":null,"Path":"/","ResponseBody":null,"CheckCollectorPipeline":{"Enabled":false,"Interval":"5m","ExporterFailureThreshold":5}}}
2024-07-17T15:16:29.333Z	info	extensions/extensions.go:52	Extension started.	{"kind": "extension", "name": "health_check"}
2024-07-17T15:16:29.334Z	info	otlpreceiver@v0.104.0/otlp.go:102	Starting GRPC server	{"kind": "receiver", "name": "otlp", "data_type": "logs", "endpoint": "0.0.0.0:4317"}
2024-07-17T15:16:29.334Z	info	otlpreceiver@v0.104.0/otlp.go:152	Starting HTTP server	{"kind": "receiver", "name": "otlp", "data_type": "logs", "endpoint": "0.0.0.0:4318"}
2024-07-17T15:16:29.334Z	info	healthcheck/handler.go:132	Health Check state change	{"kind": "extension", "name": "health_check", "status": "ready"}
2024-07-17T15:16:29.334Z	info	service@v0.104.0/service.go:219	Everything is ready. Begin running and processing data.

Additional context

No response

@vrbrahme vrbrahme added bug Something isn't working needs triage New item requiring triage labels Jul 17, 2024
@github-actions github-actions bot added the extension/healthcheck Health Check Extension label Jul 17, 2024
Copy link
Contributor

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@vrbrahme
Copy link
Author

Logs when running using version 0.103.0

2024-07-17T15:24:02.949Z	info	service@v0.103.0/service.go:115	Setting up own telemetry...
2024-07-17T15:24:02.949Z	info	service@v0.103.0/telemetry.go:96	Serving metrics	{"address": ":8888", "level": "Normal"}
2024-07-17T15:24:02.949Z	info	exporter@v0.103.0/exporter.go:280	Development component. May change in the future.	{"kind": "exporter", "data_type": "traces", "name": "debug"}
2024-07-17T15:24:02.949Z	info	exporter@v0.103.0/exporter.go:280	Development component. May change in the future.	{"kind": "exporter", "data_type": "metrics", "name": "debug"}
2024-07-17T15:24:02.949Z	info	exporter@v0.103.0/exporter.go:280	Development component. May change in the future.	{"kind": "exporter", "data_type": "logs", "name": "debug"}
2024-07-17T15:24:02.950Z	info	service@v0.103.0/service.go:182	Starting otelcol-contrib...	{"Version": "0.103.0", "NumCPU": 6}
2024-07-17T15:24:02.950Z	info	extensions/extensions.go:34	Starting extensions...
2024-07-17T15:24:02.950Z	info	extensions/extensions.go:37	Extension is starting...	{"kind": "extension", "name": "health_check"}
2024-07-17T15:24:02.950Z	info	healthcheckextension@v0.103.0/healthcheckextension.go:32	Starting health_check extension	{"kind": "extension", "name": "health_check", "config": {"Endpoint":"0.0.0.0:13133","TLSSetting":null,"CORS":null,"Auth":null,"MaxRequestBodySize":0,"IncludeMetadata":false,"ResponseHeaders":null,"CompressionAlgorithms":null,"Path":"/","ResponseBody":null,"CheckCollectorPipeline":{"Enabled":false,"Interval":"5m","ExporterFailureThreshold":5}}}
2024-07-17T15:24:02.950Z	warn	internal@v0.103.0/warning.go:42	Using the 0.0.0.0 address exposes this server to every network interface, which may facilitate Denial of Service attacks. Enable the feature gate to change the default and remove this warning.	{"kind": "extension", "name": "health_check", "documentation": "/~https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/security-best-practices.md#safeguards-against-denial-of-service-attacks", "feature gate ID": "component.UseLocalHostAsDefaultHost"}
2024-07-17T15:24:02.950Z	info	extensions/extensions.go:52	Extension started.	{"kind": "extension", "name": "health_check"}
2024-07-17T15:24:02.950Z	warn	internal@v0.103.0/warning.go:42	Using the 0.0.0.0 address exposes this server to every network interface, which may facilitate Denial of Service attacks. Enable the feature gate to change the default and remove this warning.	{"kind": "receiver", "name": "otlp", "data_type": "logs", "documentation": "/~https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/security-best-practices.md#safeguards-against-denial-of-service-attacks", "feature gate ID": "component.UseLocalHostAsDefaultHost"}
2024-07-17T15:24:02.950Z	info	otlpreceiver@v0.103.0/otlp.go:102	Starting GRPC server	{"kind": "receiver", "name": "otlp", "data_type": "logs", "endpoint": "0.0.0.0:4317"}
2024-07-17T15:24:02.950Z	warn	internal@v0.103.0/warning.go:42	Using the 0.0.0.0 address exposes this server to every network interface, which may facilitate Denial of Service attacks. Enable the feature gate to change the default and remove this warning.	{"kind": "receiver", "name": "otlp", "data_type": "logs", "documentation": "/~https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/security-best-practices.md#safeguards-against-denial-of-service-attacks", "feature gate ID": "component.UseLocalHostAsDefaultHost"}
2024-07-17T15:24:02.950Z	info	otlpreceiver@v0.103.0/otlp.go:152	Starting HTTP server	{"kind": "receiver", "name": "otlp", "data_type": "logs", "endpoint": "0.0.0.0:4318"}
2024-07-17T15:24:02.950Z	info	healthcheck/handler.go:132	Health Check state change	{"kind": "extension", "name": "health_check", "status": "ready"}
2024-07-17T15:24:02.950Z	info	service@v0.103.0/service.go:208	Everything is ready. Begin running and processing data.
2024-07-17T15:24:02.950Z	warn	localhostgate/featuregate.go:63	The default endpoints for all servers in components will change to use localhost instead of 0.0.0.0 in a future version. Use the feature gate to preview the new default.	{"feature gate ID": "component.UseLocalHostAsDefaultHost"}

@codeboten
Copy link
Contributor

I believe this is a result of the change in v0.104.0 to not bind automatically to 0.0.0.0, instead binding to localhost. The following configuration may solve your problem:

extensions:
  health_check:
    endpoint: "0.0.0.0:13133"

See #33896 that has more details

@crobert-1 crobert-1 removed bug Something isn't working needs triage New item requiring triage labels Jul 17, 2024
@jpkrohling
Copy link
Member

I'm closing, as I also think this is the solution, but feel free to reopen if the problem persists.

@brancz
Copy link
Contributor

brancz commented Jul 24, 2024

Setting

extensions:
  health_check:
    endpoint: "0.0.0.0:13133"

does not solve it. This should be reopened.

@codeboten codeboten reopened this Jul 24, 2024
@codeboten
Copy link
Contributor

@brancz i just tested the configuration. Before setting the endpoint:

$ curl localhost:13133
curl: (52) Empty reply from server

After setting the endpoint:

$ curl localhost:13133
{"status":"Server available","upSince":"2024-07-24T18:49:04.913163839Z","uptime":"1.747731252s"}%

Is it possible the configuration isn't taking effect in your case?

@codeboten
Copy link
Contributor

Haven't heard back about this issue, closing for now. Please re-open if the issue persists

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
extension/healthcheck Health Check Extension
Projects
None yet
Development

No branches or pull requests

5 participants