Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Long lines are unexpectedly split into multiple OTEL Log records #35042

Open
cwegener opened this issue Sep 5, 2024 · 6 comments
Open

Long lines are unexpectedly split into multiple OTEL Log records #35042

cwegener opened this issue Sep 5, 2024 · 6 comments
Labels
bug Something isn't working needs triage New item requiring triage pkg/stanza/fileconsumer receiver/filelog

Comments

@cwegener
Copy link
Contributor

cwegener commented Sep 5, 2024

Component(s)

pkg/stanza/fileconsumer, receiver/filelog

What happened?

Description

When using the filelog receiver, any lines that are longer than the default buffer size (16KB) will be split into multiple log records instead of one single log record.

Steps to Reproduce

  1. Create simple default filelog receiver config
  2. Ingest a file that has a line length that is greater than 16KB)

Expected Result

Lines longer than 16KB gets emitted as one single OTEL Log record

Actual Result

Lines longer than 16KB get emitted as multiple OTEL Log records, each record getting split at the 16KB boundary.

Collector version

v0.108.0

Environment information

Environment

OS: Archlinux
Compiler(if manually compiled): go 1.23.0

OpenTelemetry Collector configuration

receivers:
  filelog:
    start_at: beginning
    include:
      - "/tmp/metrics_101089747.json"

exporters:
  debug:
    verbosity: detailed
service:
  pipelines:
    logs/raw:
      receivers:
        - filelog
      exporters:
        - debug

Log output

No response

Additional context

No response

@cwegener cwegener added bug Something isn't working needs triage New item requiring triage labels Sep 5, 2024
Copy link
Contributor

github-actions bot commented Sep 5, 2024

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@VihasMakwana
Copy link
Contributor

Although the buffer size is 16KB, the maximum log size is 1024KB.
The incomplete token issue might be due to the flush timeout expiring. Could you try setting force_flush_period to 0 and see if that resolves the problem?

@cwegener
Copy link
Contributor Author

cwegener commented Sep 6, 2024

The bug still behaves the same when using different values for the flush time out.

Although, I have not actually tested a value of 0 ... I wonder what 0 means?

@cwegener
Copy link
Contributor Author

cwegener commented Sep 6, 2024

Ok.

Did another quick test and the issue is:

  • the nature of the input file is causing the problem
  • the input files that I am using (they are generated by the awss3exporter contrib component) are missing a \n at the end of the line, which is then triggering the behaviour that I originally observed above.

Adding a \n at the end of the line, fixes the issue as expected.

I'll start investigating why the \n is missing in the files generated by /~https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/exporter/awss3exporter

@cwegener
Copy link
Contributor Author

cwegener commented Sep 6, 2024

Closing this issue, as filelog receiver is behaving as expected (logs are force flushed if no log line separator token is seen within the force flush period.

@djaglowski
Copy link
Member

djaglowski commented Jan 21, 2025

I'm reopening this issue as I've stumbled onto the same behavior.

To clarify, the issue occurs when a log is:

  1. Longer than the default buffer size (16KB)
  2. Shorter than the max log size
  3. NOT terminated

What happens is that the flush function kicks in and ejects the token.

What I do not understand yet is why the buffer size does not increase up to max log size before the ejection occurs. If it did, then the ejected token would be the entire log (even though it is not explicitly terminated).

@djaglowski djaglowski reopened this Jan 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs triage New item requiring triage pkg/stanza/fileconsumer receiver/filelog
Projects
None yet
Development

No branches or pull requests

3 participants