-
Notifications
You must be signed in to change notification settings - Fork 395
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Migrate from OpenTracing to OpenTelemetry #524
Conversation
Migrate from OpenTracing to OpenTelemetry as OpenTracing has been deprecated, this also adds the start of metrics and the ability to use any GRPC Exporter as well as Prometheus.
This is really cool. I'll try to review it in a few days! |
Hey @hf, any update on this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @HarryET,
Great work here. I'm leaving some comments on the code. We would really appreciate it if you could update the README
or write some other doc on how to turn on and have an end-to-end visualization of the traces.
Internally at this time we don't collect metrics or traces from GoTrue instances, and we've got no way to test this in a production setting, so that is something we need more focus on to accept the PR.
Let me know if you need any help doing this.
|
||
switch tc.Exporter { | ||
case OtlpGrpcExporter: | ||
ctx, cancel := context.WithTimeout(context.Background(), time.Second) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this have a timeout context? While below for the HTTP one there is only a background context? Does this handle SIGTERM / SIGINT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Http is a push model, while GRPC is a 2 way sustained connection so if there isn't a timeout it could block startup when it should just be throwing the error earlier
api/api.go
Outdated
// Open Tracing GRPC requires some extra deferrals! | ||
if a.config.Tracing.OtlpMetricExporter != nil { | ||
defer a.config.Tracing.ContextCancel() // Defer GRPC Context Cancel | ||
d := a.config.Tracing.OtlpMetricExporter | ||
baseContext := context.Background() | ||
defer func() { | ||
ctx, cancel := context.WithTimeout(baseContext, time.Second) | ||
defer cancel() | ||
if err := d.Shutdown(ctx); err != nil { | ||
otel.Handle(err) | ||
} | ||
}() | ||
pusher := otelglobal.MeterProvider().(*otelcontroller.Controller) | ||
if err := pusher.Start(baseContext); err != nil { | ||
log.Fatalf("could not start metric controller: %v", err) | ||
} | ||
defer func() { | ||
ctx, cancel := context.WithTimeout(baseContext, time.Second) | ||
defer cancel() | ||
// pushes any last exports to the receiver | ||
if err := pusher.Stop(ctx); err != nil { | ||
otel.Handle(err) | ||
} | ||
}() | ||
} | ||
if a.config.Tracing.TracingShutdown != nil { | ||
defer func() { | ||
if err := a.config.Tracing.TracingShutdown(context.Background()); err != nil { | ||
log.Fatal("failed to shutdown TracerProvider: %w", err) | ||
} | ||
}() | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like this whole section can be extracted into one function that calls multiple functions, as the readability of all the defers and contexts is quite low.
otlpMetrics, | ||
), | ||
controller.WithExporter(otlpMetrics), | ||
controller.WithCollectPeriod(2*time.Second), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be made configurable?
customAttributes = append(customAttributes, semconv.ServiceNameKey.String(tc.ServiceName)) | ||
resourceAttributes := resource.WithAttributes(customAttributes...) | ||
|
||
switch tc.Exporter { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you split this giant switch
into one that calls into different functions that do the setup?
👋🏼 As requested in Slack here is a link to how we were using it in production with AWS, a similar approach can be used for testing; Code. We plan to add it back in the future and then you could use the entire terraform config to spin up a working text example you will want to switch the gotrue and gotrue proxy containers out though. or just use this as a basis for a docker-compose file |
Hey @HarryET we've decided to try and focus on this this week. I'll be taking over your changes and making them production ready if that's all right. Check back in a few days! |
Hey @hf that sounds awesome, might be worth a few messages on slack with some known issues I have? I'll reach out today and thanks for working on this; excited to see it in production! |
What kind of change does this PR introduce?
This PR migrates from Open Tracing to Open Telemetry. This retains DataDog support through Open Telemetry GRPC and HTTP collectors while also adding support directly for Prometheus and indirectly for anything that support the Open Telemetry Collectors e.g. AWS CloudWatch and Jager
What is the current behavior?
Uses the deprecated Open Tracing and only supports Datadog
What is the new behavior?
Uses Open Telemetry and supports a variety of tracing and metrics providers. Also gives the opportunity of adding metrics to GoTrue.
Additional context