-
Notifications
You must be signed in to change notification settings - Fork 881
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Should aws-sdk instrumentation emit CLIENT spans and suppress downstream http client library instrumentation? #440
Comments
For a bit more context, CLIENT / SERVER spans are important for modeling service dependencies, e.g. in a dependency graph like in Zipkin. By modeling HTTP as the CLIENT span, something like |
@trask do you mean about a solution specific to aws-sdk? Or a generic one as proposed in open-telemetry/opentelemetry-specification#530? |
First, I'm just wondering if we want to do this:
(I'm leaning towards yes, but would like others' opinions) I haven't really thought about how we would implement this yet (a generic mechanism supported by the OpenTelemetry API seems ideal). |
In some sense there is a general problem here both for client and server spans. Both for monitoring incoming and outgoing requests you usually have some higher level library inside and low-level middleware outside. Like this:
We certainly need to have generic instrumentation for lower level components in Otel, like Servlets and Netty. But in the particular case above users probably care only about Spring and AWS SDK. So if consider "suppressing" Netty in this case, we probably should consider "suppressing" Servlets as well. On the other hand, there is an opinion that Otel auto-instrumentation should just report what has happened. Without distorting any truth or doing any magic. Objective observer, not smart interpreter. In this case one can argue that such filtering/suppressing should happen in observability backend or in Collector. But not earlier. |
If people agree that the [btw, in the |
Strong +1 to suppressing the Netty spans, but open to it being configurable if there seem to be users that wouldn't need that. We could take a step back and consider this for any RPC not just AWS SDK - the service information is generally going to be in the RPC instrumentation, not its HTTP instrumentation. In every trace system I've looked at, marking the HTTP spans as client/server and the RPC spans as internal breaks service / dependency graphs because they rely on the leaf / root between services to be the RPC. Instead of
it's
because HTTP POST is the only client span in the trace. Trying to normalize this in collectors might be possible, but at the least it'd require some complicated buffering. What I would love is if HTTP instrumentation could augment a parent RPC span instead of replace it. The ideal client span for me would have
Hope this context helps, maybe the concept of multiple instrumentation working on the same span can be an interesting one to explore. |
Can you explain your preference here? Am I correct that you want to suppress Spring span on the assumption that |
Then again, suppressing HTTP spans will probably require transferring their attributes, like |
I think a couple of advantages instrumentation has over collector in this case are
Anything can be solved with some code :) But it feels to me like it is natural when handling these hierarchies in the instrumentation and becomes heuristics (magic?) when leaving the app. |
Why do you think that instrumentation has the hierarchy of spans? That may be correct in synchronous case, but things become more complicated in async case. If this assumption of yours is correct, then things become clearer indeed. |
Hierarchy may have been unclear since it's not necessarily from the root but a sort of local hierarchy. Even in asynchronous, you must have access to at least your parent or you will not be able to set the parent ID. That being said it does bring to mind a contrived example. RPC span starts, passes job to another thread and ends. The other thread then runs the HTTP connection which gets the client span. It seems wrong at least from a UX perspective if the RPC span ends in microseconds and the transport ends in milliseconds, users want to know how long the RPC took - transport is just an implementation detail. Does this example also help illustrate how instrumentation needs to be aware of this tight coupling between RPC and transport? |
To be precise, you must have access to your parent id, not the whole parent span :) But in practise I think we always pass the whole span around, so fair point. Very good example, it really brings a lot of thoughts... Maybe it should be just 1 span? Which starts when RPC starts and ends when HTTP transport finishes its job. With attributes "merged" from both sources. It seems that we are at the very important crossroad: should instrumentations be independent or should they tightly and explicitly collaborate? |
Async RPC clients typically have some kind of callback that we use to know when the RPC request is fully complete (i.e. when the response has been fully received). So I believe async RPC spans are already correct from a timing perspective (no need to end the span in the underlying transport).
For the specific proposal in the title, I don't think this requires any tight or explicit collaboration. It just requires a mechanism to suppress downstream spans (e.g. open-telemetry/opentelemetry-specification#530). |
@trask Yeah my example was contrived as what "losing your parent" might look like, but currently the instrumentation is fine. Supressing transport spans vs including them is still better for modeling for me, but treating "transport instrumentation" specially, so they can contribute to an RPC span by adding transport attributes, does seem even nicer. I can't think of any other scenarios than transport, so having a special category of instrumentation for them could make sense. |
This makes sense to me. And I think the modeling issue you raise is important, where you want the I could see the Netty instrumentation supporting a "transport" mode, where (based on some Context flag) it sets attributes on the parent span (and maybe add events to the parent span) as opposed to creating it's own span. |
I don't like the idea of bringing in one more special case of inter-instrumentation communication without taking into account a bigger picture. We will end up with one specific case of Spring-Servlet communication, one specific case for RPC-Netty communication, etc etc. I would very much prefer a limited set of common mechanism how such communication could happen and be documented and obvious from reading the code. |
Yup |
This seems reasonable to me, wondering what others think?
See comment by @anuraaga at open-telemetry/opentelemetry-specification#530 (comment).
The text was updated successfully, but these errors were encountered: