DB, messaging, gRPC (over HTTP): clarify nested client spans #674

lmolkova · 2024-01-30T02:45:13Z

When working with db, messaging, or other high-level client libraries, applications could create (at least) two distinct layers of spans (when both layers are instrumented):

logical operation (such as perform DB query or publish an event)
network calls that perform the operation (obtain auth tokens when necessary, request or send information, retry, etc)

Some common examples:

Elasticsearch which works over HTTP
AWS SQS/SNS that work over HTTP
Azure CosmosDB that can work over HTTP or TCP-based protocol (which is being instrumented)
gRPC works on top of HTTP which (at least in .NET) could result in gRPC client over HTTP client spans.

As a result, logical-level spans describe corresponding domain (DB, messaging, etc) and network level spans describe individual RPC calls.

These two layers contain different information, even server.address:server.port could be different (logical operation points to a cluster/domain name, while network spans may point to individual nodes in the cluster, regional endpoints, etc).

Problems it creates:

semconv don't usually clarify how to model such relationships and which information goes where:
- should logical operations have network.* attributes?
- should both be recorded by default or which one is more important?
- should both have client kind?
it's hard to efficiently visualize flat service map - it becomes noisy and confusing:

Related:
#652
open-telemetry/oteps#172

The text was updated successfully, but these errors were encountered:

pyohannes · 2024-01-31T10:50:16Z

When working with db, messaging, or other high-level client libraries, applications create (at least) two distinct layers of spans:

I wouldn't say that applications create at least two distinct layers of spans. While the problem is real, many instrumentations today only instrument the logical layer, with protocols built on top of HTTP being the most notable exceptions.

should logical operations have network.* attributes?

I'd say yes, given that in many cases on likely doesn't reliably know whether lower-level instrumentation exists?

should both be recorded by default or which one is more important?

This depends on the use case. While for most cases logical operations might be more important, especially for troubleshooting errors and performance issues information about transport operations can be crucial.

should both have client kind?

For messaging we mostly solved this question as we'll have producer/consumer kinds for the logical layer and client/server kinds for the transport layer. Producers and consumers on the logical layer will be connected via links, via clients and servers on the transport layer will be connected with parent/child relationships. However, this model will not work well for databases, where both layers will have client/server relationships.

pyohannes · 2024-01-31T10:57:35Z

it's hard to efficiently visualize flat service map - it becomes noisy and confusing:

A service map should either focus on logical operations, or otherwise support nesting. Dreaming up a solution, that's how I'd sketch it:

lmolkova · 2024-01-31T18:04:34Z

I love this visualization @pyohannes!

I'd say yes, given that in many cases on likely doesn't reliably know whether lower-level instrumentation exists?

What if I know? E.g. most Azure SDKs work on top of HTTP and HTTP is instrumented. Or Cosmos DB knows if underlying transport protocol calls are instrumented (by the same SDK). Same case would happen with messaging and AMQP the moment AMQP gets instrumented.

For messaging we mostly solved this question as we'll have producer/consumer kinds for the logical layer and client/server kinds for the transport layer.

not quite - we still have publish/~~receive~~ spans for which we nicely avoided documenting the kind - they are logical.
Assuming the transport-level (e,g, AMQP) is instrumented, messaging would be in the same boat as DB (and gRPC).

producer/consumer spans would connect application nodes, but client spans would still be needed to show the broker as a node.

jcocchi · 2024-01-31T23:02:45Z

What if I know? E.g. most Azure SDKs work on top of HTTP and HTTP is instrumented. Or Cosmos DB knows if underlying transport protocol calls are instrumented (by the same SDK). Same case would happen with messaging and AMQP the moment AMQP gets instrumented.

For Cosmos DB, we know that instrumentation exists in the SDK for both logical and network calls, but we don't necessarily know if customers have subscribed to both in their application. If we can be certain which sources customers are listening to it may be possible to dynamically change the span kind based on that, but changing the span kind depending on the listener may end up creating more confusion.

pyohannes · 2024-02-01T10:40:58Z

not quite - we still have publish/receive spans for which we nicely avoided documenting the kind - they are logical.
Assuming the transport-level (e,g, AMQP) is instrumented, messaging would be in the same boat as DB (and gRPC).

The "Publish" span should of kind PRODUCER if it's used as creation context. "Receive" spans should always be of kind CONSUMER. It's important that we always try to link from PRODUCER to CONSUMER spans on the logical level, as this allows us to keep the relationships on the two levels separate.

producer/consumer spans would connect application nodes, but client spans would still be needed to show the broker as a node.

This is true. I don't know of a service map that would nicely support that.

lmolkova · 2024-04-26T15:40:02Z

Messaging and database conventions were updated to reflect logical nature. So this issue is limited to RPC

Fixes #3172 (Built on top of #4088) ## Changes - Explains kinds without assuming presence of parent/children - Adds links as another correlation mechanism - Normalizes nested client spans even further - database, messaging, RPC, and LLM semantic conventions require CLIENT kind for logical client operation. - Does not touch INTERNAL kind yet - #4179 * [x] Related issues #3172, open-telemetry/semantic-conventions#674, open-telemetry/oteps#172, open-telemetry/semantic-conventions#1315 * ~~[ ] Related [OTEP(s)](/~https://github.com/open-telemetry/oteps) #~~ * ~~[ ] Links to the prototypes (when adding or changing features)~~ * [x] [`CHANGELOG.md`](/~https://github.com/open-telemetry/opentelemetry-specification/blob/main/CHANGELOG.md) file updated for non-trivial changes * ~~[ ] [`spec-compliance-matrix.md`](/~https://github.com/open-telemetry/opentelemetry-specification/blob/main/spec-compliance-matrix.md) updated if necessary~~ --------- Co-authored-by: Tigran Najaryan <4194920+tigrannajaryan@users.noreply.github.com> Co-authored-by: Yuri Shkuro <yurishkuro@users.noreply.github.com> Co-authored-by: Trask Stalnaker <trask.stalnaker@gmail.com>

Fixes open-telemetry#3172 (Built on top of open-telemetry#4088) ## Changes - Explains kinds without assuming presence of parent/children - Adds links as another correlation mechanism - Normalizes nested client spans even further - database, messaging, RPC, and LLM semantic conventions require CLIENT kind for logical client operation. - Does not touch INTERNAL kind yet - open-telemetry#4179 * [x] Related issues open-telemetry#3172, open-telemetry/semantic-conventions#674, open-telemetry/oteps#172, open-telemetry/semantic-conventions#1315 * ~~[ ] Related [OTEP(s)](/~https://github.com/open-telemetry/oteps) #~~ * ~~[ ] Links to the prototypes (when adding or changing features)~~ * [x] [`CHANGELOG.md`](/~https://github.com/open-telemetry/opentelemetry-specification/blob/main/CHANGELOG.md) file updated for non-trivial changes * ~~[ ] [`spec-compliance-matrix.md`](/~https://github.com/open-telemetry/opentelemetry-specification/blob/main/spec-compliance-matrix.md) updated if necessary~~ --------- Co-authored-by: Tigran Najaryan <4194920+tigrannajaryan@users.noreply.github.com> Co-authored-by: Yuri Shkuro <yurishkuro@users.noreply.github.com> Co-authored-by: Trask Stalnaker <trask.stalnaker@gmail.com>

github-actions bot assigned jsuereth Jan 30, 2024

lmolkova changed the title ~~DB, messaging: clarify nested client spans~~ DB, messaging, gRPC (overHTTP): clarify nested client spans Jan 31, 2024

lmolkova changed the title ~~DB, messaging, gRPC (overHTTP): clarify nested client spans~~ DB, messaging, gRPC (over HTTP): clarify nested client spans Jan 31, 2024

pyohannes added this to Spec: Messaging Semantics Feb 1, 2024

github-project-automation bot moved this to V1 - Stable Semantics in Spec: Messaging Semantics Feb 1, 2024

pyohannes moved this from V1 - Stable Semantics to In Triage in Spec: Messaging Semantics Feb 1, 2024

pyohannes moved this from In Triage to Post-stability in Spec: Messaging Semantics Feb 1, 2024

lmolkova mentioned this issue Feb 5, 2024

DB/messaging/rpc(?): should logical operations include network.* attributes #690

Closed

trask added this to Database Client Semantic Conventions Feb 7, 2024

pyohannes mentioned this issue Feb 8, 2024

Turn off spans from a scope while retaining downstream spans and without breaking traces open-telemetry/opentelemetry-specification#3867

Closed

jcocchi mentioned this issue Feb 9, 2024

REQUEST: New membership for @jcocchi open-telemetry/community#1935

Closed

6 tasks

carlosalberto mentioned this issue Feb 15, 2024

Clarify relationship between messaging, faas, and RPC #652

Closed

lmolkova moved this to Done in Database Client Semantic Conventions Apr 26, 2024

This was referenced Aug 6, 2024

Refactor description of span kind open-telemetry/opentelemetry-specification#4178

Merged

Allow INTERNAL GenAI/db spans instead of requiring the kind to be CLIENT #1315

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DB, messaging, gRPC (over HTTP): clarify nested client spans #674

DB, messaging, gRPC (over HTTP): clarify nested client spans #674

lmolkova commented Jan 30, 2024 •

edited

Loading

pyohannes commented Jan 31, 2024

pyohannes commented Jan 31, 2024

lmolkova commented Jan 31, 2024 •

edited

Loading

jcocchi commented Jan 31, 2024

pyohannes commented Feb 1, 2024

lmolkova commented Apr 26, 2024

DB, messaging, gRPC (over HTTP): clarify nested client spans #674

DB, messaging, gRPC (over HTTP): clarify nested client spans #674

Comments

lmolkova commented Jan 30, 2024 • edited Loading

pyohannes commented Jan 31, 2024

pyohannes commented Jan 31, 2024

lmolkova commented Jan 31, 2024 • edited Loading

jcocchi commented Jan 31, 2024

pyohannes commented Feb 1, 2024

lmolkova commented Apr 26, 2024

lmolkova commented Jan 30, 2024 •

edited

Loading

lmolkova commented Jan 31, 2024 •

edited

Loading