Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

podman endpoints cfg issue #34522

Open
cforce opened this issue Aug 8, 2024 · 7 comments
Open

podman endpoints cfg issue #34522

cforce opened this issue Aug 8, 2024 · 7 comments
Labels
bug Something isn't working receiver/podman Stale

Comments

@cforce
Copy link

cforce commented Aug 8, 2024

Component(s)

receiver/podman

What happened?

Description

The endpoint used differs from the one configured. An exrta "/" is injected for unknown reasons

"dial unix /run//podman/podman.sock: "

but configured is

"endpoint: unix://run/podman/podman.sock"

Maybe docs are buggy and we ned to use escape for ":" or /"?

see /~https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/receiver/podmanreceiver

Collector version

0.106.1

Environment information

Environment

OpenTelemetry Collector configuration

extensions:
  zpages:
    endpoint: "127.0.0.1:55679"

  health_check:
    endpoint: "127.0.0.1:8081"

  pprof:
    endpoint: "127.0.0.1:1777"
    block_profile_fraction: 3
    mutex_profile_fraction: 5

receivers:
  prometheus/otelcol:
    config:
      scrape_configs:
        - job_name: 'otelcol'
          scrape_interval: 10s
          static_configs:
            - targets: ['localhost:8888']
  podman_stats:
    endpoint: unix://run/podman/podman.sock
    timeout: 10s
    collection_interval: 30s    
  hostmetrics:
    collection_interval: 30s
    normalizeProcessCPUUtilization: true
    scrapers:
      cpu:
        metrics:
          system.cpu.frequency:
            enabled: true
          system.cpu.logical.count:
            enabled: true
          system.cpu.physical.count:
            enabled: true
          system.cpu.utilization:
            enabled: true
      load:
      paging:
        metrics:
          system.paging.utilization:
            enabled: true
      filesystem:
        metrics:
          system.filesystem.utilization:
            enabled: true
      network:
        metrics:
          system.network.conntrack.count:
            enabled: true
          system.network.conntrack.max:
              enabled: true
      memory:
        metrics:
          system.linux.memory.available:
            enabled: true
          system.memory.limit:
            enabled: true
          system.memory.utilization:
            enabled: true
      processes:
      process:
        metrics:
          process.threads:
            enabled: true
          process.signals_pending:
            enabled: true
          process.paging.faults:
            enabled: true
          process.memory.utilization:
            enabled: true
          process.open_file_descriptors:
            enabled: true
          process.handles:
            enabled: true
          process.disk.operations:
            enabled: true
          process.context_switches:
            enabled: true  
          process.cpu.utilization:
            enabled: true
        mute_process_name_error: true
        mute_process_exe_error: true
        mute_process_io_error: true
        mute_process_user_error: true
        mute_process_cgroup_error: true
    resource_attributes:
      process.cgroup: true
  hostmetrics/disk:
    collection_interval: 3m
    scrapers:
      disk: 
  otlp:
    protocols:
      grpc:
        endpoint: "${env:HOST_IP}:4317"
        #endpoint: "127.0.0.1:4317"

processors:
  resourcedetection/env:
    detectors: [env, system]
    timeout: 15s
    override: true
  batch:
    # Datadog APM Intake limit is 3.2MB. Let's make sure the batches do not go over that.
    send_batch_max_size: 8192 # (default = 8192): Maximum batch size of spans to be sent to the backend. The default value is 8192 spans.
    send_batch_size: 512 # (default = 512): Maximum number of spans to process in a batch. The default value is 512 spans.
    timeout: 10s # (default = 5s): Maximum time to wait until the batch is sent. The default value is 5s.
  memory_limiter:
    check_interval: 5s
    limit_mib: 150
  attributes:
    actions:
      - key: tags
        value:
          - 'env:dev'
        action: upsert
  resource:
    attributes:
      - key: env
        value: 'dev'
        action: insert
      - key: geo
        action: insert
      - key: region
        action: insert
exporters:
  # logging:
  #   verbosity: detailed
  otlphttp:
    endpoint: http://127.0.0.1:9081/otlp-http

service:
  telemetry:
    metrics:
      address: 'localhost:8888'
    logs:
      level: 'info'
    traces:
      propagators:
        - "b3"
        - "tracecontext"
  extensions: [zpages, health_check, pprof]
  pipelines:
    metrics:
      receivers: [otlp, podman_stats, prometheus/otelcol]
      processors: [memory_limiter, batch, attributes, resource, resourcedetection/env]
      exporters: [otlphttp]
    traces:
      receivers: [otlp]
      processors: [memory_limiter, batch, attributes, resource]
      exporters: [otlphttp]
    logs:
      receivers: [otlp]
      processors: [memory_limiter, batch, attributes, resource]
      exporters: [otlphttp]

Log output

2024-08-08T14:11:28.915Z	info	service@v0.106.1/service.go:117	Setting up own telemetry...
2024-08-08T14:11:28.916Z	info	service@v0.106.1/service.go:120	OpenCensus bridge is disabled for Collector telemetry and will be removed in a future version, use --feature-gates=-service.disableOpenCensusBridge to re-enable
2024-08-08T14:11:28.916Z	info	service@v0.106.1/telemetry.go:96	Serving metrics	{"address": "localhost:8888", "metrics level": "Normal"}
2024-08-08T14:11:28.917Z	info	memorylimiter/memorylimiter.go:75	Memory limiter configured	{"kind": "processor", "name": "memory_limiter", "pipeline": "logs", "limit_mib": 150, "spike_limit_mib": 30, "check_interval": 5}
2024-08-08T14:11:28.920Z	info	service@v0.106.1/service.go:199	Starting micotelcollector...	{"Version": "0.106.1", "NumCPU": 2}
2024-08-08T14:11:28.920Z	info	extensions/extensions.go:36	Starting extensions...
2024-08-08T14:11:28.920Z	info	extensions/extensions.go:39	Extension is starting...	{"kind": "extension", "name": "health_check"}
2024-08-08T14:11:28.920Z	info	healthcheckextension@v0.106.1/healthcheckextension.go:32	Starting health_check extension	{"kind": "extension", "name": "health_check", "config": {"Endpoint":"127.0.0.1:8081","TLSSetting":null,"CORS":null,"Auth":null,"MaxRequestBodySize":0,"IncludeMetadata":false,"ResponseHeaders":null,"CompressionAlgorithms":null,"ReadTimeout":0,"ReadHeaderTimeout":0,"WriteTimeout":0,"IdleTimeout":0,"Path":"/","ResponseBody":null,"CheckCollectorPipeline":{"Enabled":false,"Interval":"5m","ExporterFailureThreshold":5}}}
2024-08-08T14:11:28.921Z	info	extensions/extensions.go:56	Extension started.	{"kind": "extension", "name": "health_check"}
2024-08-08T14:11:28.921Z	info	extensions/extensions.go:39	Extension is starting...	{"kind": "extension", "name": "zpages"}
2024-08-08T14:11:28.921Z	info	zpagesextension@v0.106.1/zpagesextension.go:54	Registered zPages span processor on tracer provider	{"kind": "extension", "name": "zpages"}
2024-08-08T14:11:28.921Z	info	zpagesextension@v0.106.1/zpagesextension.go:64	Registered Host's zPages	{"kind": "extension", "name": "zpages"}
2024-08-08T14:11:28.921Z	info	zpagesextension@v0.106.1/zpagesextension.go:76	Starting zPages extension	{"kind": "extension", "name": "zpages", "config": {"Endpoint":"127.0.0.1:55679","TLSSetting":null,"CORS":null,"Auth":null,"MaxRequestBodySize":0,"IncludeMetadata":false,"ResponseHeaders":null,"CompressionAlgorithms":null,"ReadTimeout":0,"ReadHeaderTimeout":0,"WriteTimeout":0,"IdleTimeout":0}}
2024-08-08T14:11:28.921Z	info	extensions/extensions.go:56	Extension started.	{"kind": "extension", "name": "zpages"}
2024-08-08T14:11:28.921Z	info	extensions/extensions.go:39	Extension is starting...	{"kind": "extension", "name": "pprof"}
2024-08-08T14:11:28.921Z	info	pprofextension@v0.106.1/pprofextension.go:60	Starting net/http/pprof server	{"kind": "extension", "name": "pprof", "config": {"TCPAddr":{"Endpoint":"127.0.0.1:1777","DialerConfig":{"Timeout":0}},"BlockProfileFraction":3,"MutexProfileFraction":5,"SaveToFile":""}}
2024-08-08T14:11:28.921Z	info	extensions/extensions.go:56	Extension started.	{"kind": "extension", "name": "pprof"}
2024-08-08T14:11:28.923Z	info	internal/resourcedetection.go:125	began detecting resource information	{"kind": "processor", "name": "resourcedetection/env", "pipeline": "metrics"}
2024-08-08T14:11:28.923Z	info	internal/resourcedetection.go:139	detected resource information	{"kind": "processor", "name": "resourcedetection/env", "pipeline": "metrics", "resource": {"host.name":"runner-ykxhnyexq-project-45956638-concurrent-0","os.type":"linux"}}
2024-08-08T14:11:28.923Z	info	otlpreceiver@v0.106.1/otlp.go:102	Starting GRPC server	{"kind": "receiver", "name": "otlp/richos", "data_type": "traces", "endpoint": "localhost:4317"}
2024-08-08T14:11:28.924Z	info	prometheusreceiver@v0.106.1/metrics_receiver.go:307	Starting discovery manager	{"kind": "receiver", "name": "prometheus/otelcol", "data_type": "metrics"}
2024-08-08T14:11:28.925Z	info	prometheusreceiver@v0.106.1/metrics_receiver.go:285	Scrape job added	{"kind": "receiver", "name": "prometheus/otelcol", "data_type": "metrics", "jobName": "otelcol"}
2024-08-08T14:11:28.925Z	info	prometheusreceiver@v0.106.1/metrics_receiver.go:376	Starting scrape manager	{"kind": "receiver", "name": "prometheus/otelcol", "data_type": "metrics"}
2024-08-08T14:11:28.925Z	error	graph/graph.go:432	Failed to start component	{"error": "Get \"http://d/v3.3.1/libpod/containers/json?filters=%7B%22status%22%3A%5B%22running%22%5D%7D\": dial unix /run//podman/podman.sock: connect: no such file or directory", "type": "Receiver", "id": "podman_stats"}
2024-08-08T14:11:28.926Z	info	service@v0.106.1/service.go:262	Starting shutdown...
2024-08-08T14:11:28.926Z	info	healthcheck/handler.go:132	Health Check state change	{"kind": "extension", "name": "health_check", "status": "unavailable"}
2024-08-08T14:11:28.927Z	info	extensions/extensions.go:63	Stopping extensions...
2024-08-08T14:11:28.927Z	info	zpagesextension@v0.106.1/zpagesextension.go:105	Unregistered zPages span processor on tracer provider	{"kind": "extension", "name": "zpages"}
2024-08-08T14:11:28.927Z	info	service@v0.106.1/service.go:276	Shutdown complete.
Error: cannot start pipelines: Get "http://d/v3.3.1/libpod/containers/json?filters=%7B%22status%22%3A%5B%22running%22%5D%7D": dial unix /run//podman/podman.sock: connect: no such file or directory
2024/08/08 14:11:28 collector server run finished with error: cannot start pipelines: Get "http://d/v3.3.1/libpod/containers/json?filters=%7B%22status%22%3A%5B%22running%22%5D%7D": dial unix /run//podman/podman.sock: connect: no such file or directory

Additional context

No response

@cforce cforce added bug Something isn't working needs triage New item requiring triage labels Aug 8, 2024
Copy link
Contributor

github-actions bot commented Aug 8, 2024

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@rogercoll
Copy link
Contributor

Thanks for raising this.

The connection strategy was mostly copy/paste from /~https://github.com/containers/podman/blob/main/pkg/bindings/connection.go#L90. And you are correct, the implementation is adding an extra "/" for non-absolute paths unix sockets. I reckon that both forms should work in most situations, but relying on unix:/// is safer when specifying absolute paths to avoid any possible misinterpretation by the system or application.

Does your socket exist in /run/podman/podman.sock?

@cforce
Copy link
Author

cforce commented Aug 8, 2024

Yes, looks all good for me - I have added below debug logs in same shell where otelcol is executed later

echo "podman info:"
podman info

echo "Debugging info"
echo "XDG_RUNTIME_DIR=$XDG_RUNTIME_DIR"
ls -al  $XDG_RUNTIME_DIR/*.sock || true
echo "/run/user/podman/"
ls -al  /run/user/podman/*.sock || true

and output

podman info:
host:
  arch: amd64
  buildahVersion: 1.28.2
  cgroupControllers:
  - cpuset
  - cpu
  - io
  - memory
  - hugetlb
  - pids
  - rdma
  cgroupManager: cgroupfs
  cgroupVersion: v2
  conmon:
    package: conmon_2.1.6+ds1-1_amd64
    path: /usr/bin/conmon
    version: 'conmon version 2.1.6, commit: unknown'
  cpuUtilization:
    idlePercent: 62.46
    systemPercent: 14.2
    userPercent: 23.34
  cpus: 2
  distribution:
    codename: bookworm
    distribution: debian
    version: "12"
  eventLogger: file
  hostname: runner-jlguopmm-project-45956638-concurrent-0
  idMappings:
    gidmap: null
    uidmap: null
  kernel: 5.15.154+
  linkmode: dynamic
  logDriver: k8s-file
  memFree: 5200777216
  memTotal: 8341037056
  networkBackend: cni
  ociRuntime:
    name: crun
    package: crun_1.8.1-1+deb12u1_amd64
    path: /usr/bin/crun
    version: |-
      crun version 1.8.1
      commit: f8a096be060b22ccd3d5f3ebe44108517fbf6c30
      rundir: /run/user/podman/crun
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +YAJL
  os: linux
  remoteSocket:
    exists: true
    path: unix:///run/user/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: false
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: false
  serviceIsRemote: true
  slirp4netns:
    executable: ""
    package: ""
    version: ""
  swapFree: 2147479552
  swapTotal: 2147479552
  uptime: 0h 2m 0.00s
plugins:
  authorization: null
  log:
  - k8s-file
  - none
  - passthrough
  - journald
  network:
  - bridge
  - macvlan
  - ipvlan
  volume:
  - local
registries: {}
store:
  configFile: /usr/share/containers/storage.conf
  containerStore:
    number: 0
    paused: 0
    running: 0
    stopped: 0
  graphDriverName: overlay
  graphOptions: {}
  graphRoot: /var/lib/containers/storage
  graphRootAllocated: 272[265](https://gitlab.com/Mercedes-Intelligent-Cloud/mic-monlog/micotelcollector/-/jobs/7538892225#L265)42080
  graphRootUsed: 9394491392
  graphStatus:
    Backing Filesystem: overlayfs
    Native Overlay Diff: "false"
    Supports d_type: "true"
    Using metacopy: "false"
  imageCopyTmpDir: /builds/Mercedes-Intelligent-Cloud/mic-monlog/micotelcollector
  imageStore:
    number: 0
  runRoot: /run/containers/storage
  volumePath: /var/lib/containers/storage/volumes
version:
  APIVersion: 4.3.1
  Built: 0
  BuiltTime: Thu Jan  1 00:00:00 1970
  GitCommit: ""
  GoVersion: go1.19.8
  Os: linux
  OsArch: linux/amd64
  Version: 4.3.1
Debugging info
XDG_RUNTIME_DIR=/run/user/podman
srw------- 1 root root 0 Aug  8 17:11 /run/user/podman/podman.sock
/run/user/podman/
srw------- 1 root root 0 Aug  8 17:11 /run/user/podman/podman.sock
/run/user/root/

Ok --Path was wrong.

@rogercoll
Copy link
Contributor

Ok --Path was wrong.

Great we found out the root problem, do you think we can close the issue?

@cforce
Copy link
Author

cforce commented Aug 9, 2024

I still think the message and the wrong path composed because of some strange fallback default is completely misleading and shall be improved

@rogercoll
Copy link
Contributor

Although the receiver is just propagating the error retrieved from the libpod package, I agree that the error message could be improved. Regarding the path fallback strategy, I would prefer to rely on what the containers/podman package does.

@cforce Would you be interested in opening a PR to improve the error message?

Copy link
Contributor

github-actions bot commented Dec 2, 2024

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the Stale label Dec 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working receiver/podman Stale
Projects
None yet
Development

No branches or pull requests

3 participants