Skip to content
This repository has been archived by the owner on Dec 6, 2024. It is now read-only.

Stabilize messaging semantic conventions for tracing #192

Closed
wants to merge 39 commits into from

Conversation

pyohannes
Copy link
Contributor

This OTEP aims to describe the necessary changes for bringing the existing semantic conventions for messaging from the current experimental to a stable state.

It is based on OTEP 0173, which defines basic terms and describes messaging scenarios that should be supported by the semantic conventions.

NOTE: This is an early draft document. It captures results of discussions currently going on in the messaging workgroup.

Copy link
Member

@joaopgrassi joaopgrassi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for putting this up! I left a few things.

text/trace/0192-messaging-semantic-conventions-spec.md Outdated Show resolved Hide resolved
text/trace/0192-messaging-semantic-conventions-spec.md Outdated Show resolved Hide resolved
text/trace/0192-messaging-semantic-conventions-spec.md Outdated Show resolved Hide resolved
text/trace/0192-messaging-semantic-conventions-spec.md Outdated Show resolved Hide resolved
text/trace/0192-messaging-semantic-conventions-spec.md Outdated Show resolved Hide resolved
text/trace/0192-messaging-semantic-conventions-spec.md Outdated Show resolved Hide resolved
text/trace/0192-messaging-semantic-conventions-spec.md Outdated Show resolved Hide resolved
Copy link
Member

@Oberon00 Oberon00 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seeing just how much (valuable!) discussion this OTEP generates, I would strongly suggest to change the goal here:

Do not aim to overhaul & stabilize the messaging semantic conventions in one go. Overhaul it first, and make stabilizing (which is currently the OTEP title) a separate, next goal that comes after.

@Oberon00 Oberon00 self-requested a review July 27, 2022 16:21
Comment on lines +3 to +6
This document aims to describe the necessary changes for bringing the [existing semantic conventions for messaging](/~https://github.com/open-telemetry/opentelemetry-specification/blob/a1a8676a43dce6a4e447f65518aef8e98784306c/specification/trace/semantic_conventions/messaging.md)
from the current [experimental](/~https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/versioning-and-stability.md#experimental)
to a [stable](/~https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/versioning-and-stability.md#stable)
state.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This document aims to describe the necessary changes for bringing the [existing semantic conventions for messaging](/~https://github.com/open-telemetry/opentelemetry-specification/blob/a1a8676a43dce6a4e447f65518aef8e98784306c/specification/trace/semantic_conventions/messaging.md)
from the current [experimental](/~https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/versioning-and-stability.md#experimental)
to a [stable](/~https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/versioning-and-stability.md#stable)
state.
This document aims to overhaul the [existing semantic conventions for messaging](/~https://github.com/open-telemetry/opentelemetry-specification/blob/a1a8676a43dce6a4e447f65518aef8e98784306c/specification/trace/semantic_conventions/messaging.md).

@@ -0,0 +1,659 @@
# Stabilizing messaging semantic conventions for tracing
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Stabilizing messaging semantic conventions for tracing
# Overhauling messaging semantic conventions for tracing

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stabilizing has a more concrete meaning in the context of the spec. Overhauling is just a change to something else that is not stable and can change again.
If the purpose is getting this done for a fist version, I would be in favor of stabilizing.

[`messaging.destination.kind`](#messagingdestinationkind) | string | No
[`messaging.destination.temporary`](#messagingdestinationtemporary) | string | No
[`messaging.destination.anonymous`](#messagingdestinationanonymous) | string | No
[`messaging.source.name`](#messagingsourcename) | string | For consumer spans
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the OTEP text should document why the decision was made to split the "destination" attributes were split into "destination" and "source" (and maybe reconsider).

[`net.app.protocol.name`](#netappprotocolname) | string | No
[`net.app.protocol.version`](#netappprotocolversion) | string | No
[`net.peer.ip`](#netpeerip) | string | No
[`net.peer.name`](#netpeername) | string | No
Copy link
Member

@Oberon00 Oberon00 Jul 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to suggest a new "concept" here, which is that of the "receipt handle". In AWS SQS, when you call the ReceiveMesssage API, you get, along with the message ID (which is also missing as an attribute here), a unique "Receipt handle" for each message. The receipt handle is what you need to provide to DeleteMessage or DeleteMessageBatch to settle the message (instead of, e.g., the message ID).

The receipt handle is a rather but not terribly long base64 string. I wonder if...

  • ...Other messaging systems have a similar mechanism (if not, all the other questions are irrelevant at this point)
  • ...This concept should be described in the "individual message settlement" section
  • ...It would make sense defining an attribute for it, e.g. to correlate with logs. It is possible that this only makes sense in a future step together with intermediary instrumentation. But it could be interesting to track whether a message is re-processed without being re-delivered/received. Maybe too much of an edge case 🙂

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tbh I have never used AWS SQS and don't know any other messaging system having this concept, can you mention another one please?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know one, that's why I'm asking.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I read that wrong ;-) Now it let me think that something like that should be in the Service Bus as well but in general in AMQP 1.0.
Let's take a look at the REST interface for Service Bus via HTTP. Other than a receive and delete message, there is a two-steps way: peek lock and delete. During the Peek Lock the consumer gets the message and a sequencenumber with it. This will be used in the next Delete Message together with the message id to delete the message.
But even from an AMQP 1.0 point of view it makes sense because looking at the specification, the disposition frame (which is the ack from the consumer) defines a delivery-id. This maps with the Azure Service Bus AMQP 1.0 implementation here where you can search for the sequence-number again for deleting a message. I guess that the sequence number maps to the delivery-id at AMQP 1.0 level.
Said that, I would agree that maybe a concept like a receipt handle, sequence number or delivery id could make sense to add for the "settlement" by the consumer.

Comment on lines +413 to +414
particular set of producers and consumer. Often such destinations are unnamed
or have an auto-generated name.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why "often"? Isn't that the definition of it being anonymous?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would agree but in the end it's never unnamed but always auto-generated (it's needed to be addressed). This is the reason why, for example, AMQP 1.0 protocol define it as "dynamic" not "anonymous".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It already says "unnamed or have an auto-generated name". So why is that only "often"? If it is named and the name is not auto-generated, can it still be anonymous?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tbh I don't know what it's referring to as "unnamed" destination. Right now "anonymous" sounds to be wrong, unless someone else can raise an example of why it's right.

Comment on lines +146 to +150
For each producer scenario, a "Publish" span needs to be created. This span
measures the duration of the call or operation that provides messages for
sending or publishing to an intermediary. This call or operation (and the
related "Publish" span) can either refer to a single message or a batch of
multiple messages.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
For each producer scenario, a "Publish" span needs to be created. This span
measures the duration of the call or operation that provides messages for
sending or publishing to an intermediary. This call or operation (and the
related "Publish" span) can either refer to a single message or a batch of
multiple messages.
For each producer scenario, a "Publish" span needs to be created. This span measures the time for publishing the message which, depending on the messaging system and the producer API, could include:
* just storing the message in an internal producer buffer, and asynchronously sent to the intermediary
* actual synchronous sending to the intermediary
* one of the above but also getting an acknowledgement/settlement back from the intermediary
This call or operation (and the related "Publish" span) can either refer to a single message or a batch of multiple messages.


For each producer scenario, a "Publish" span needs to be created. This span
measures the duration of the call or operation that provides messages for
sending or publishing to an intermediary. This call or operation (and the

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made a suggestion to list the scenarios.

messages in order to fulfill the [requirements for context propagation](#context-propagation).
While preserving the freedom for instrumentor to choose how to propagate
context, in the future these conventions should list recommended ways of how to
propagate context using popular messaging protocols.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, just leaving a comment as I came across an issue that can be resolved if some sort of standards are in place.

If you refer to the issue above, it is apparent that there is already a divergence of approaches to propagate context with different instrumentors in different languages. I wonder if more attention should be given to establishing a convention before everyone starts doing things differently and becoming out of control.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @melvinkcx
A while ago this OTEP /~https://github.com/open-telemetry/oteps/blob/main/text/trace/0205-messaging-semantic-conventions-context-propagation.md was merged to make the context propagation a bit more explicit.

It doesn't go in exact details on how this should be done (where in the message to put the context, under which "name" etc) because there's no finalized standards for it. The OTEP only makes it clear that the context should be transported together with the message and should be immutable. Since each protocol/messaging system does something different, it's hard to come up with a specification that works for everyone. Our intentions for now is to define the minimum on how propagation will work, and leave to instrumentation to implement it using their appropriate idioms. For ex, this draft document for AMQP tells to add the context in application-properties.

Of course, once stable specifications are available, we will update the recommendations in the OTel conventions, like the AMQP one.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the insights!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since each protocol/messaging system does something different, it's hard to come up with a specification that works for everyone

How about specifying just a guideline / best-practice like: "Call the propagator on the message headers or equivalent"

Copy link
Member

@joaopgrassi joaopgrassi Aug 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I opened a PR to bring the contents of the OTP into the spec. Maybe that already gives the direction we need, without having to be precise on where to add. open-telemetry/opentelemetry-specification#2750. It also mentions when more standards are available, we will be updating the guidelines.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One interesting line on the W3C document for AMQP linked above:
"...AMQP message section "application-properties" is immutable collection of properties"
They recommend to use application-properties because is immutable data that brokers cannot change. This aligns with the immutable Creation Context from above.
The MQTT doc doesn't mention immutability but defines user-properties and the method to use

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@brunobat and that's for MQTT v5 you are referring to I guess. Thinking at MQTT 3.1.1 there is no notion of application/user properties, everything is in your payload.

Copy link

@brunobat brunobat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check #207 because the propagation will soon might have:

  • Baggage. For app data
  • Context-scoped attributes. It is proposed that Context-scoped attributes MUST not be propagated.
  • Instrumentation scope attributes. Not sure.

Seems that we need to clarify what tracing data will be transported in the message and how.

@@ -0,0 +1,659 @@
# Stabilizing messaging semantic conventions for tracing

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stabilizing has a more concrete meaning in the context of the spec. Overhauling is just a change to something else that is not stable and can change again.
If the purpose is getting this done for a fist version, I would be in favor of stabilizing.

identifiable.

In the strict sense, a _message_ is a payload that is sent to a specific
destination, whereas an _event_ is a signal emitted by a component upon

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not the case for a topic, unless the destination is defined as a broker.
Wouldn't it be better to say something like:
_message_ is a payload sent by a Producer to one or more Consumers, either directly or by using Intermediaries
I think it aligns better with what's written bellow.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think with a topic, the destination is the topic. That's how the current ("old") messaging semantic conventions have it defined: The destination of a message is a queue or a topic.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good to me @Oberon00

A _creation context_ allows correlating the producer with the consumers of a
message, regardless of intermediary instrumentation. The creation context is
created by the producer and must be propagated to the consumers. It must not be
altered by intermediaries. This context helps to model dependencies between

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It must not be altered by intermediaries.
The only way to ensure this is to sign the propagated attributes, like what's done with a Json Web Token (JTW). Not saying we need to sign stuff now, but in the future we might need, to prevent tampering.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have this part sorted out in the other PR open-telemetry/opentelemetry-specification#2750 (comment)

Comment on lines +129 to +130
> A producer SHOULD attach a creation context to each message. The creation context
> SHOULD be attached in a way so that it is not possible to be changed by intermediaries.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my understanding, we are going to push for a message span that goes from producer send to consumer receive. Right?

| Create m2 | . . . . . . . .
+-----------+---------+ . +------------+
| Publish | . . | Receive m2 |
+---------+ +------------+
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't seem to be consistent within pull-based consumer scenarios in what the parent context should be for a receive span. I believe the intent is that the parent should be the application's ambient span?

Although intermediary instrumentation is out of scope, it's something I'm thinking about and there is a decision to make as to how the transport context of the message fits into these traces. If you care about the path the message takes, it's nice to have the receive span as a child of the message's context. But since it is the application's "pull" that caused the span, it's probably more correct for the receive span to be a child of the application's ambient span.

If we agree this is the right approach, it probably makes the most sense for the receive span to link to the message's transport context (although no need to mention here as that would be out of scope). If there is no ambient context, it seems as though it would be nicer to be a child rather than create a brand new trace with no parent at all. On the one hand, this means we have inconsistent structure. On the other hand, does splitting the traces up help with anything?

1. The _creation context layer_ allows correlating the producer with the
consumers of a message, regardless of intermediary instrumentation. The
creation context is created by the producer and must be propagated to the
consumers. It must not be altered by intermediaries.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"It Should not be altered by intermediaries."
Because we are going to discuss intermediaries later.

Comment on lines +616 to +618
One possibility to seamlessly integrate producer/consumer and intermediary
instrumentation in a flexible and extensible way would be the introduction of a
second transport context layer in addition to the creation context layer.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of an "immutable" creation context for the message because it seems simpler and less ambiguous to understand, define and implement.
Transport context seem too broad. It can be used for anything transport related and in each message, it might contain arbitrary pieces of data that were added or modified along the way. Super useful, but makes every message a non obvious subject requiring analysis. Even graphic representation might be convoluted.

instrumentation.
2. An additional _transport context layer_ allows correlating the producer and
the consumer with an intermediary. It also allows to correlate multiple
intermediaries among each other. The transport context can be changed by

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The correlation should be message based. "Correlating intermediaries" sounds like broker framework related work. Do we want to provide tools to make that easier? Which ones?

messages in order to fulfill the [requirements for context propagation](#context-propagation).
While preserving the freedom for instrumentor to choose how to propagate
context, in the future these conventions should list recommended ways of how to
propagate context using popular messaging protocols.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One interesting line on the W3C document for AMQP linked above:
"...AMQP message section "application-properties" is immutable collection of properties"
They recommend to use application-properties because is immutable data that brokers cannot change. This aligns with the immutable Creation Context from above.
The MQTT doc doesn't mention immutability but defines user-properties and the method to use

@Oberon00
Copy link
Member

@brunobat
#192 (review)

Please check #207 because the propagation will soon might have

#207 should not be relevant to this PR at all. It does not change anything regarding propagation.

@pyohannes
Copy link
Contributor Author

The messaging workgroup was capturing findings and results of discussions in this draft PR. As this draft PR got very big and it got hard to keep track of discussions going on in comments, the workgroup decided to close this PR and work on different artifacts:

@pyohannes pyohannes closed this Oct 19, 2022
carlosalberto pushed a commit that referenced this pull request Jun 29, 2023
This OTEP aims at defining consistent conventions about what spans to create for messaging scenarios, and at defining how those spans relate to each other. Instrumentors should be able to rely on a consistent set of conventions, as opposed to deducing conventions from a set of examples.

This was split from OTEP #192, which became too comprehensive.
carlosalberto pushed a commit to carlosalberto/oteps that referenced this pull request Oct 23, 2024
This OTEP aims at defining consistent conventions about what spans to create for messaging scenarios, and at defining how those spans relate to each other. Instrumentors should be able to rely on a consistent set of conventions, as opposed to deducing conventions from a set of examples.

This was split from OTEP open-telemetry#192, which became too comprehensive.
carlosalberto pushed a commit to carlosalberto/oteps that referenced this pull request Oct 23, 2024
This OTEP aims at defining consistent conventions about what spans to create for messaging scenarios, and at defining how those spans relate to each other. Instrumentors should be able to rely on a consistent set of conventions, as opposed to deducing conventions from a set of examples.

This was split from OTEP open-telemetry#192, which became too comprehensive.
carlosalberto pushed a commit to carlosalberto/oteps that referenced this pull request Oct 30, 2024
This OTEP aims at defining consistent conventions about what spans to create for messaging scenarios, and at defining how those spans relate to each other. Instrumentors should be able to rely on a consistent set of conventions, as opposed to deducing conventions from a set of examples.

This was split from OTEP open-telemetry#192, which became too comprehensive.
carlosalberto pushed a commit to carlosalberto/oteps that referenced this pull request Oct 31, 2024
This OTEP aims at defining consistent conventions about what spans to create for messaging scenarios, and at defining how those spans relate to each other. Instrumentors should be able to rely on a consistent set of conventions, as opposed to deducing conventions from a set of examples.

This was split from OTEP open-telemetry#192, which became too comprehensive.
carlosalberto pushed a commit to carlosalberto/oteps that referenced this pull request Nov 1, 2024
This OTEP aims at defining consistent conventions about what spans to create for messaging scenarios, and at defining how those spans relate to each other. Instrumentors should be able to rely on a consistent set of conventions, as opposed to deducing conventions from a set of examples.

This was split from OTEP open-telemetry#192, which became too comprehensive.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.