Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Containerd rotated files not being captured #35084

Closed
brandonbirdj opened this issue Sep 9, 2024 · 0 comments
Closed

Containerd rotated files not being captured #35084

brandonbirdj opened this issue Sep 9, 2024 · 0 comments
Labels
bug Something isn't working needs triage New item requiring triage

Comments

@brandonbirdj
Copy link

Component(s)

No response

What happened?

Description

When containerd rotates logs it turns a log such as path/0.log to path/0.log.20240904-210753
when this occurs the collector tries to follow it, however many filelog include examples no longer match and so it is not followed.

If the included is fixed however, then there are errors determining the path as the rotated timestamp is missing from /~https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/pkg/stanza/operator/parser/container/parser.go#L31

When not using the container parser and instead using /~https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/examples/kubernetes/otel-collector.yaml The following error is found

error: "failed to handle attribute mappings: failed to detect a valid log path"
kind: "receiver"
level: "error"
msg: "process: %w"
name: "filelog"
path: "/var/log/pods/org-monitoring_cat-logs-9b48896f6-xdd2f_db553071-16c3-43f5-bd16-795dccdc2c45/cat-logs/0.log.20240905-173404"

However it can be fixed by changing
/~https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/examples/kubernetes/otel-collector.yaml#L56
to regex: '^.*\/(?P<namespace>[^_]+)_(?P<pod_name>[^_]+)_(?P<pod_id>[a-f0-9\-]{36})\/(?P<container_name>[^\._]+)\/(?P<restart_count>\d+)\.log.*$'

I think container operator can be fixed by changing /~https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/pkg/stanza/operator/parser/container/parser.go#L31 to const logpathPattern = "^.*\\/(?P<namespace>[^_]+)_(?P<pod_name>[^_]+)_(?P<uid>[a-f0-9\\-]+)\\/(?P<container_name>[^\\._]+)\\/(?P<restart_count>\\d+)\\.log.* $"

With both the the fixes above the successful messages as follows can be seen.

msg: "File has been rotated(moved)"

Steps to Reproduce

  1. Start in an environment using containerd, eg GKE
    Configure container-operator or older file log parsers according to documentation.

  2. Increase telemetry

service:
  telemetry:
    logs:
      encoding: json
      level: debug
  1. Update receivers to include rotated logs
receivers:
  filelog:
    include:
    - /var/log/pods/*/*/*.log
    - /var/log/pods/*/*/*.log.*
    exclude:
    # Exclude zipped logs so we don't double ship them
    - /var/log/pods/*/*/*.gz
    # The .tmp log only appears briefly during rotation as part of the zipping process
    - /var/log/pods/*/*/*.tmp
  1. Generate enough logs to trigger a file rotation. Likely 10MB or more.

  2. Inspect the logs to see if it moved or more likely "lost"

Bonus: Check the attributes["log.file.path"] to see if it ever included more that `(restart count).log)

Expected Result

Actual Result

No logs are found during/after rotation.

See Also

Here is a small screen recording of the file rotation process for containerd.
/~https://github.com/user-attachments/assets/8b2ca45a-ffe5-41f9-a862-97a7da216442

Collector version

any

Environment information

Environment

Containerd

OpenTelemetry Collector configuration

/~https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/examples/kubernetes/otel-collector.yaml

Exhibits this behavior with the following change


receivers:
  filelog:
    include:
    - /var/log/pods/*/*/*.log
    - /var/log/pods/*/*/*.log.*
    exclude:
    # Exclude zipped logs so we don't double ship them
    - /var/log/pods/*/*/*.gz
    - /var/log/pods/*/*/*.tmp

Turning on extra logging helps to see the problem

service:
  telemetry:
    logs:
      encoding: json
      level: debug


### Log output

_No response_

### Additional context

/~https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/examples/kubernetes/otel-collector.yaml#L56

/~https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/pkg/stanza/operator/parser/container/parser.go#L31

/~https://github.com/user-attachments/assets/8b2ca45a-ffe5-41f9-a862-97a7da216442
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs triage New item requiring triage
Projects
None yet
Development

No branches or pull requests

1 participant