Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cleaning up context on reusable carriers before or after injection #1811

Closed
lmolkova opened this issue Jul 9, 2021 · 10 comments
Closed

Cleaning up context on reusable carriers before or after injection #1811

lmolkova opened this issue Jul 9, 2021 · 10 comments
Assignees
Labels
spec:context Related to the specification/context directory

Comments

@lmolkova
Copy link
Contributor

lmolkova commented Jul 9, 2021

What are you trying to achieve?

Context: #1767 #1783 open-telemetry/opentelemetry-java-instrumentation#440 open-telemetry/opentelemetry-java-instrumentation#903

I'd like to create some clarity around context propagation (for HTTP for now). It also helps when multiple multiple instrumentations in the process for an HTTP request. I think it makes more sense (in terms of supportability and least surprise to users) to always pick outer instrumentation, i.e manual user instrumentation always wins.

However, propagator spec mentions that context should be stripped from the request if it can be reused between tries, which means that inner instrumentation always wins.

The predefined propagation fields. If your carrier is reused, you should delete the fields here
before calling inject.

I'd like to get community opinion on if it is really a desired behavior

It was introduced in #147 and there is no reasoning/explanation, so above are my assumptions.

Additional context.
It not very common to reuse HTTP requests between retries, but e.g. golang http client allows that and there are likey other clients/cases where it happens.

@lmolkova lmolkova added the spec:context Related to the specification/context directory label Jul 9, 2021
@iNikem
Copy link
Contributor

iNikem commented Jul 12, 2021

I think it makes more sense (in terms of supportability and least surprise to users) to always pick outer instrumentation, i.e manual user instrumentation always wins.

Totally agreed.

However, propagator spec mentions that context should be stripped from the request if it can be reused between tries, which means that inner instrumentation always wins.

The predefined propagation fields. If your carrier is reused, you should delete the fields here

before calling inject.

If we agree that only outmost instrumentation should inject its CLIENT SpanContext into the carrier, then the requirement above does not actually apply to the nested instrumentations. Because they should not call inject anyway.

@lmolkova
Copy link
Contributor Author

lmolkova commented Jul 12, 2021

I think we have to discuss approaches for context injection on the nested spans.
My logic here is that redirects/retries have to have different contexts (because they are sometimes (for redirects) and usually (for retries) part of app logic and would have different contexts in real life. If sometimes we are able to wrap them under CLIENT span (and they share context) and sometimes not (and have different context), brings inconsistent experience to users . So I suggest always have new context on inner spans that actually perform HTTP requests.

I'm writing this under the assumption we're creating specs that can be implemented with auto-instrumentation.

More here: #1783 (comment)

@lmolkova
Copy link
Contributor Author

lmolkova commented Jul 12, 2021

Maybe things that don't need new context like circuit-breaking should not even need a span and can be expressed as an event?

And maybe this way can we stay consistent:

  • rpc call - new span - new context injected
  • no rpc call - no span (event on the current span instead) - no context injection.

Logical CLIENT calls in this case don't really need to follow HTTP instrumentation - they wrap multiple underlying HTTP calls instead. And they don't inject context. Also, they are not always feasible to the instrument and in most cases (no retries/redirects) replicate what happens on
HTTP level under them.

@trask
Copy link
Member

trask commented Jul 13, 2021

Additional context.
It not very common to reuse HTTP requests between retries, but e.g. golang http client allows that and there are likey other clients/cases where it happens.

[just as a tangent: I was surprised to find recently in /~https://github.com/open-telemetry/opentelemetry-java-instrumentation/pull/2727 that reusing HTTP requests is supported by most of the 20+ Java http client libraries that we instrument]

I agree that we need a way to identify if we are in a nested CLIENT span, for a couple of reasons:

  • to avoid overwriting traceparent from the outer CLIENT span
  • to conditionally suppress nested CLIENT spans (based on some kind of user verbosity preference)

It sounds like there are at least a couple of options to accomplish this:

  • use the presence of traceparent to know that we are in a nested CLIENT span (this requires clearing traceparent at the end of each request if http request reuse is supported by the instrumented library)
  • use some kind of marker in the Context to know that we are in a nested CLIENT span (this requires setting the marker in the context and propagating it to any downstream libraries that may also be instrumented)

Maybe we can focus initially on

  1. Getting agreement that we really do need a way to identify if we are in a nested CLIENT span
  2. Listing options to accomplish that

@tedsuo
Copy link
Contributor

tedsuo commented Jul 13, 2021

It sounds like unless there is a bug in instrumentation, last write should always win.

If at all possible, we should go with last write wins, and write OpenTelemetry instrumentation with that in mind. I am worried that other solutions add more state and conditional rules, which make it difficult for end users to reason about what their system is doing.

Is there any situation, outside of instrumentation being written or applied in an incoherent manner, where last write wins would not be the correct solution?

@lmolkova
Copy link
Contributor Author

lmolkova commented Jul 13, 2021

It sounds like unless there is a bug in instrumentation, last write should always win.

Please take a look at #1767, open-telemetry/opentelemetry-java-instrumentation#440 - there could be legit cases when it happens and there is double-instrumentation (native library + auto).

I think we all have different ideas of what's HTTP (gRPC, etc) instrumentation and we first need to start with clarifications.

I believe here are some ideas I heard:

  1. HTTP span is actually a logical call that wraps retries/redirects/etc. Retries have the same context and maybe separate spans or events on a logical span.
  2. HTTP span is a low-level RPC call and all retries have individual context

Then there are different ideas of what's more useful to users

  1. higher level logical call
  2. lower level rpc call

And then there are different problems

  1. How to suppress a certain level of instrumentation (logical/rpc)
  2. How to suppress duplicate instrumentation that happen

So I think we need to build clarity around this within semantic convention WG and try to come up with a proposal that is

  • feasible to do for auto and native instrumentation
  • gives consistent tracing experience across usage patterns/http clients/languages
  • allows preventing double-instrumentation (same type of it) issues
  • allows suppressing certain instrumentations (e.g. logical calls or rpc calls) to helps users manage costs and reduce the noise they don't need

I will publish my way of thinking around it based on my experience with native instrumentation of .NET HTTP client in pre-OTel era which is up and running for years.

@tedsuo
Copy link
Contributor

tedsuo commented Jul 13, 2021

I agree that we need to define the conventions for having layered client and server spans. I look forwards to your thoughts. I think there is a lack of clarity around how we should be instrumenting these types of layered systems.

In general, I am concerned about adding complexity. In particular, I'm concerned about control mechanisms that allow instrumentation authors to suppress instrumentation coming from other libraries. I'd rather we extend the semantic conventions to provide a consistent language for configuring instrumentation, and allow the end user control over what information is emitted, rather than give instrumentation authors tools for messing around with instrumentation they did not write.

Ideally, our solution for managing configuration can be implemented in languages which do not use the agent approach. And agents are aware of the configuration mechanism that the end user has access to, so there aren't multiple types of control mechanisms fighting with each other.

Related note: I think I need a better understanding about the cases where "double instrumentation" is unavoidable. It feels like higher level instrumentation should defer to lower level instrumentation rather than re-implement it, and if the lower-level client is pluggable then every type of client which is plugged in should provide instrumentation.

I get that reality has edge cases, but it feels like we should try our best to avoid ladling on any complexity we can avoid.

@iNikem
Copy link
Contributor

iNikem commented Jul 14, 2021

@tedsuo For me the simplest case of "unavoidable double instrumentation" is this:

  • Application developer manually creates a "logical" span with kind=CLIENT around a remote call
  • That remote call is implemented by http client which is integrated with Otel, either natively or by the means of instrumentation agent.

Both parties (developer and library) want to create a CLIENT span and propagate it across the wire. I am not alone in the opinion that in this case library (meaning lower level or nested span) should back off.

@Oberon00
Copy link
Member

The discussion here doesn't really have much to do with the original issue topic/title anymore. Maybe you could copy to #1767 or create a new issue so that others can find this discussion?

@lmolkova
Copy link
Contributor Author

@tedsuo @iNikem
I wrote rather a long read trying to clarify concerns about HTTP instrumentation. We can use it as a starting point for the discussion.
Let's continue the double instrumentation discussion in #1767.

@Oberon00, good point, I'll close this issue and link to corresponding issues (#1747 too), it seems inevitable that we have to clear context before making calls for the majority of cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
spec:context Related to the specification/context directory
Projects
None yet
Development

No branches or pull requests

6 participants