Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failure to send log message can block the process #18

Open
danteu opened this issue Jul 22, 2024 · 0 comments · May be fixed by #19
Open

Failure to send log message can block the process #18

danteu opened this issue Jul 22, 2024 · 0 comments · May be fixed by #19

Comments

@danteu
Copy link

danteu commented Jul 22, 2024

Hi,

I've encountered the following problem: When using slog-loki, if the Loki server is not reachable when sending the first message, the process may halt. This only seems to occur if there is a significant (>= 1 second?) delay between the first two messages. Consider the following code:

package main

import (
	"time"

	"github.com/grafana/loki-client-go/loki"
	slogloki "github.com/samber/slog-loki/v3"

	"log/slog"
)

func main() {
	config, _ := loki.NewDefaultConfig("http://localhost:3100/loki/api/v1/push")
	config.TenantID = "xyz"
	client, _ := loki.New(config)

	lokiLogHandler := slogloki.Option{Level: slog.LevelDebug, Client: client}.NewLokiHandler()
	defer func() {
		client.Stop()
	}()
	lokiLogger := slog.New(lokiLogHandler)

	lokiLogger.Error("Message 1")
	slog.Info("Message 1 logged")

	time.Sleep(1 * time.Second)

	lokiLogger.Error("Message 2")
	slog.Info("Message 2 logged")
	lokiLogger.Error("Message 3")
	slog.Info("Message 3 logged")

	time.Sleep(3 * time.Second)
	slog.Info("Post sleep")
}

When running this code without an available Loki instance, I'd expect the application to output the three "Message x logged" info logs to stdout, as well as a lot of "error sending batch, will retry" messages to stderr. However, the application halts for about seven minutes and then continues to run as expected:

$ go run main.go             
2024/07/22 22:22:28 INFO Message 1 logged
level=warn component=client host=localhost:3100 msg="error sending batch, will retry" status=-1 error="Post \"http://localhost:3100/loki/api/v1/push\": dial tcp 127.0.0.1:3100: connect: connection refused"
[...]
level=error component=client host=localhost:3100 msg="final error sending batch" status=-1 error="Post \"http://localhost:3100/loki/api/v1/push\": dial tcp 127.0.0.1:3100: connect: connection refused"
2024/07/22 22:29:33 INFO Message 2 logged
2024/07/22 22:29:33 INFO Message 3 logged
level=warn component=client host=localhost:3100 msg="error sending batch, will retry" status=-1 error="Post \"http://localhost:3100/loki/api/v1/push\": dial tcp 127.0.0.1:3100: connect: connection refused"
level=warn component=client host=localhost:3100 msg="error sending batch, will retry" status=-1 error="Post \"http://localhost:3100/loki/api/v1/push\": dial tcp 127.0.0.1:3100: connect: connection refused"
2024/07/22 22:29:36 INFO Post sleep
level=warn component=client host=localhost:3100 msg="error sending batch, will retry" status=-1 error="Post \"http://localhost:3100/loki/api/v1/push\": dial tcp 127.0.0.1:3100: connect: connection refused"
[...]

This error only occurs if there is a significant delay between the first two messages. When I remove the first sleep, the expected behaviour occurs.

[...]
	lokiLogger.Error("Message 1")
	slog.Info("Message 1 logged")
	lokiLogger.Error("Message 2")
	slog.Info("Message 2 logged")
	lokiLogger.Error("Message 3")
	slog.Info("Message 3 logged")
[...]

Output:

$ go run main.go
2024/07/22 22:33:45 INFO Message 1 logged
2024/07/22 22:33:45 INFO Message 2 logged
2024/07/22 22:33:45 INFO Message 3 logged
level=warn component=client host=localhost:3100 msg="error sending batch, will retry" status=-1 error="Post \"http://localhost:3100/loki/api/v1/push\": dial tcp 127.0.0.1:3100: connect: connection refused"
level=warn component=client host=localhost:3100 msg="error sending batch, will retry" status=-1 error="Post \"http://localhost:3100/loki/api/v1/push\": dial tcp 127.0.0.1:3100: connect: connection refused"
2024/07/22 22:33:48 INFO Post sleep
level=warn component=client host=localhost:3100 msg="error sending batch, will retry" status=-1 error="Post \"http://localhost:3100/loki/api/v1/push\": dial tcp 127.0.0.1:3100: connect: connection refused"
[...]

I haven't yet had time to investigate further edge cases or to look into how grafana/loki-client-go handles message batching, which I think could be related to this issue.

Any ideas on what the cause of this issue may be and how to solve it?

Best regards,
Daniel

danteu added a commit to danteu/slog-loki that referenced this issue Jul 27, 2024
- When calling Client.Handle, the log entry is written to an unbuffered
  channel. This write blocks when the Loki client is trying to send a
  log batch to the Loki server. When the Loki server is unreachable,
  this can block for multiple minutes.
- This change fixes the issue by running Client.Handle in a separate
  goroutine.

Fixes samber#18
@danteu danteu changed the title Failure to send first log message can block the process Failure to send log message can block the process Jul 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant