Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DB, messaging, gRPC (over HTTP): clarify nested client spans #674

Open
lmolkova opened this issue Jan 30, 2024 · 6 comments
Open

DB, messaging, gRPC (over HTTP): clarify nested client spans #674

lmolkova opened this issue Jan 30, 2024 · 6 comments
Assignees

Comments

@lmolkova
Copy link
Contributor

lmolkova commented Jan 30, 2024

When working with db, messaging, or other high-level client libraries, applications could create (at least) two distinct layers of spans (when both layers are instrumented):

  • logical operation (such as perform DB query or publish an event)
  • network calls that perform the operation (obtain auth tokens when necessary, request or send information, retry, etc)

Some common examples:

  • Elasticsearch which works over HTTP
  • AWS SQS/SNS that work over HTTP
  • Azure CosmosDB that can work over HTTP or TCP-based protocol (which is being instrumented)
  • gRPC works on top of HTTP which (at least in .NET) could result in gRPC client over HTTP client spans.

As a result, logical-level spans describe corresponding domain (DB, messaging, etc) and network level spans describe individual RPC calls.

These two layers contain different information, even server.address:server.port could be different (logical operation points to a cluster/domain name, while network spans may point to individual nodes in the cluster, regional endpoints, etc).

Problems it creates:

  • semconv don't usually clarify how to model such relationships and which information goes where:
    • should logical operations have network.* attributes?
    • should both be recorded by default or which one is more important?
    • should both have client kind?
  • it's hard to efficiently visualize flat service map - it becomes noisy and confusing:

image

Related:
#652
open-telemetry/oteps#172

@pyohannes
Copy link
Contributor

When working with db, messaging, or other high-level client libraries, applications create (at least) two distinct layers of spans:

I wouldn't say that applications create at least two distinct layers of spans. While the problem is real, many instrumentations today only instrument the logical layer, with protocols built on top of HTTP being the most notable exceptions.

should logical operations have network.* attributes?

I'd say yes, given that in many cases on likely doesn't reliably know whether lower-level instrumentation exists?

should both be recorded by default or which one is more important?

This depends on the use case. While for most cases logical operations might be more important, especially for troubleshooting errors and performance issues information about transport operations can be crucial.

should both have client kind?

For messaging we mostly solved this question as we'll have producer/consumer kinds for the logical layer and client/server kinds for the transport layer. Producers and consumers on the logical layer will be connected via links, via clients and servers on the transport layer will be connected with parent/child relationships. However, this model will not work well for databases, where both layers will have client/server relationships.

@pyohannes
Copy link
Contributor

it's hard to efficiently visualize flat service map - it becomes noisy and confusing:

A service map should either focus on logical operations, or otherwise support nesting. Dreaming up a solution, that's how I'd sketch it:

image

@lmolkova
Copy link
Contributor Author

lmolkova commented Jan 31, 2024

I love this visualization @pyohannes!

I'd say yes, given that in many cases on likely doesn't reliably know whether lower-level instrumentation exists?

What if I know? E.g. most Azure SDKs work on top of HTTP and HTTP is instrumented. Or Cosmos DB knows if underlying transport protocol calls are instrumented (by the same SDK). Same case would happen with messaging and AMQP the moment AMQP gets instrumented.

For messaging we mostly solved this question as we'll have producer/consumer kinds for the logical layer and client/server kinds for the transport layer.

not quite - we still have publish/receive spans for which we nicely avoided documenting the kind - they are logical.
Assuming the transport-level (e,g, AMQP) is instrumented, messaging would be in the same boat as DB (and gRPC).

producer/consumer spans would connect application nodes, but client spans would still be needed to show the broker as a node.

@lmolkova lmolkova changed the title DB, messaging: clarify nested client spans DB, messaging, gRPC (overHTTP): clarify nested client spans Jan 31, 2024
@lmolkova lmolkova changed the title DB, messaging, gRPC (overHTTP): clarify nested client spans DB, messaging, gRPC (over HTTP): clarify nested client spans Jan 31, 2024
@jcocchi
Copy link
Contributor

jcocchi commented Jan 31, 2024

What if I know? E.g. most Azure SDKs work on top of HTTP and HTTP is instrumented. Or Cosmos DB knows if underlying transport protocol calls are instrumented (by the same SDK). Same case would happen with messaging and AMQP the moment AMQP gets instrumented.

For Cosmos DB, we know that instrumentation exists in the SDK for both logical and network calls, but we don't necessarily know if customers have subscribed to both in their application. If we can be certain which sources customers are listening to it may be possible to dynamically change the span kind based on that, but changing the span kind depending on the listener may end up creating more confusion.

@pyohannes
Copy link
Contributor

not quite - we still have publish/receive spans for which we nicely avoided documenting the kind - they are logical.
Assuming the transport-level (e,g, AMQP) is instrumented, messaging would be in the same boat as DB (and gRPC).

The "Publish" span should of kind PRODUCER if it's used as creation context. "Receive" spans should always be of kind CONSUMER. It's important that we always try to link from PRODUCER to CONSUMER spans on the logical level, as this allows us to keep the relationships on the two levels separate.

producer/consumer spans would connect application nodes, but client spans would still be needed to show the broker as a node.

This is true. I don't know of a service map that would nicely support that.

@lmolkova
Copy link
Contributor Author

Messaging and database conventions were updated to reflect logical nature. So this issue is limited to RPC

lmolkova added a commit to open-telemetry/opentelemetry-specification that referenced this issue Sep 3, 2024
Fixes #3172

(Built on top of #4088)

## Changes

- Explains kinds without assuming presence of parent/children 
- Adds links as another correlation mechanism
- Normalizes nested client spans even further - database, messaging,
RPC, and LLM semantic conventions require CLIENT kind for logical client
operation.
- Does not touch INTERNAL kind yet -
#4179

* [x] Related issues #3172,
open-telemetry/semantic-conventions#674,
open-telemetry/oteps#172,
open-telemetry/semantic-conventions#1315
* ~~[ ] Related [OTEP(s)](/~https://github.com/open-telemetry/oteps) #~~
* ~~[ ] Links to the prototypes (when adding or changing features)~~
* [x]
[`CHANGELOG.md`](/~https://github.com/open-telemetry/opentelemetry-specification/blob/main/CHANGELOG.md)
file updated for non-trivial changes
* ~~[ ]
[`spec-compliance-matrix.md`](/~https://github.com/open-telemetry/opentelemetry-specification/blob/main/spec-compliance-matrix.md)
updated if necessary~~

---------

Co-authored-by: Tigran Najaryan <4194920+tigrannajaryan@users.noreply.github.com>
Co-authored-by: Yuri Shkuro <yurishkuro@users.noreply.github.com>
Co-authored-by: Trask Stalnaker <trask.stalnaker@gmail.com>
carlosalberto pushed a commit to carlosalberto/opentelemetry-specification that referenced this issue Oct 31, 2024
Fixes open-telemetry#3172

(Built on top of open-telemetry#4088)

## Changes

- Explains kinds without assuming presence of parent/children 
- Adds links as another correlation mechanism
- Normalizes nested client spans even further - database, messaging,
RPC, and LLM semantic conventions require CLIENT kind for logical client
operation.
- Does not touch INTERNAL kind yet -
open-telemetry#4179

* [x] Related issues open-telemetry#3172,
open-telemetry/semantic-conventions#674,
open-telemetry/oteps#172,
open-telemetry/semantic-conventions#1315
* ~~[ ] Related [OTEP(s)](/~https://github.com/open-telemetry/oteps) #~~
* ~~[ ] Links to the prototypes (when adding or changing features)~~
* [x]
[`CHANGELOG.md`](/~https://github.com/open-telemetry/opentelemetry-specification/blob/main/CHANGELOG.md)
file updated for non-trivial changes
* ~~[ ]
[`spec-compliance-matrix.md`](/~https://github.com/open-telemetry/opentelemetry-specification/blob/main/spec-compliance-matrix.md)
updated if necessary~~

---------

Co-authored-by: Tigran Najaryan <4194920+tigrannajaryan@users.noreply.github.com>
Co-authored-by: Yuri Shkuro <yurishkuro@users.noreply.github.com>
Co-authored-by: Trask Stalnaker <trask.stalnaker@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Post-stability
Development

No branches or pull requests

4 participants