-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gateway: add opentelemetry #6110
Conversation
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## dev #6110 +/- ##
============================================
+ Coverage 26.54% 26.56% +0.01%
Complexity 2123 2123
============================================
Files 931 931
Lines 123067 123263 +196
Branches 2678 2678
============================================
+ Hits 32674 32740 +66
- Misses 88806 88936 +130
Partials 1587 1587
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
570f599
to
0eb6599
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's awesome that you tackled this 🚀
2 general remarks at this point:
- why do we need some specific Jaeger/Datadog crates? Isn't the goal of OpenTelemetry to have a standard and not be vendor dependent? I used to connect to my OpenTelemetry collectors with only opentelemetry_otlp.
- I see that all the spans and attributes are done manually with the rather verbose API of
opentelemetry
crates. On top of that, that means that theContext
has to be forwarded into every function call, which doesn't make it easy. Maybe we could leveragetracing
and its ecosystem to simplify all of that. Withtracing
, the code would look a bit like the following.
#[tracing::instrument(
skip_all,
fields(
proxy.upstream_uri = self.proxy.rebase_uri(Some("ws"), unprocessed_path, query).to_string(),
proxy.type = "ws"
)
)]
async fn handle_websocket(
&self,
req: &HttpRequest,
stream: web::Payload,
unprocessed_path: &str,
query: &str,
) -> Result<HttpResponse, actix_web::Error> {
// keep the original body without modification
}
This means no more Context
(I believe tracing
use the thread local to keep all the needed information). And all the information related to instrumentation of the code is not intertwined anymore with the body of the function. By default, a new span is created for each #[tracing::instrument]
with the function's name used for the span's name (can be customized if needed).
I might be able to help for both remarks, so reach out if you need to. I believe the second (or even both?) remarks could be addressed later (which is why I'm fine to merge as-is since it already brings a very nice improvement on current situation).
0eb6599
to
688f0c8
Compare
Removed the jaeger backend, implemented the otlp one. Kept the Datadog backend, since adding using otlp with datadog requires adding a translation service.
Tracing relies on thread-local variables to attribute spans in an async context, which require careful context switching on each future poll. It can either be done manually or via a macro but either way failure to add context switching instrumentation causes span misattribution. The current method is less likely to be misused and does not involve any macro. |
688f0c8
to
09f958c
Compare
09f958c
to
e88e722
Compare
The advised way to do that is actually to use the Opentelemetry Collector with one of the contributed exporter for Datadog (see here). I see one some pros and one con:
Got it! Let's not overdo it, for now! |
EDIT: breaking news: the datadog agent is needed anyway, so keeping the datadog exporter over the OLTP one is pointless. @ElysaSrc is making the change |
e88e722
to
b6674d1
Compare
This PR adds a basic opentelemetry implementation on OSRD stack.