-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Amqp core metrics: step 1 #30583
Amqp core metrics: step 1 #30583
Conversation
Note, ServiceBus |
API change check APIView has identified API level changes in this PR and created following API reviews. |
Good work! A while back I recall you mentioning there was a performance impact to having metrics in the codebase. Are you able to quantify this impact when metrics are enabled and also when they are disabled? Thanks! |
@JonathanGiles, sorry I overlooked your question. Micro-benchmarks show that when metrics are disabled (and any code that collects e.g. duration is guarded), each measurement takes less than a nanosecond ( Reporting a histogram with dynamic (cached in a map) attributes (`basicHistogramWithDynamicAttributes ) is ~60 ns and still allocation free after attributes are created.
I've tried to measure it with EventHubs performance tests, and the problem is that they are quite unstable. They aren't CPU or memory bound. Consequent baseline measurements can differ drastically (10-20% easily). With a lot of long tests and aggregations I get something like this:
I also profiled perf tests with metrics enabled and during profiling metrics stacks took less than 1% of CPU time, and were even smaller with regard to allocations. I did some investigation with metric collection in general with storage tests and it seems metric collection impact is smaller than error interval. |
4660ec5
to
ed45b5e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good - a few pieces of feedback for your consideration
...re/azure-core-amqp/src/main/java/com/azure/core/amqp/implementation/AmqpMetricsProvider.java
Show resolved
Hide resolved
...azure-core-amqp/src/main/java/com/azure/core/amqp/implementation/ReactorHandlerProvider.java
Show resolved
Hide resolved
...azure-core-amqp/src/main/java/com/azure/core/amqp/implementation/ReactorHandlerProvider.java
Outdated
Show resolved
Hide resolved
...azure-core-amqp/src/main/java/com/azure/core/amqp/implementation/ReactorHandlerProvider.java
Show resolved
Hide resolved
...azure-core-amqp/src/main/java/com/azure/core/amqp/implementation/handler/SessionHandler.java
Show resolved
Hide resolved
// TODO: by GA we need to figure out default naming (when no mapping is defined) | ||
// and follow otel attributes conventions if we can or make sure mapping is defined | ||
// for all attributes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably tackle this ~now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
curious what's the urgency?
this mapping exists in beta metrics package with no requirement to GA in the nearest future.
However, we have an ask to GA tracing plugin in Zn, where we'll do exactly the same mapping and we'll figure it out there first. Assuming we'll resource the ask, it should happen in the next few months.
And I'd really love to take it now, but we're working with OTel community on changing these exact attributes, so anything I could write might change very soon.
...ics-opentelemetry/src/main/java/com/azure/core/metrics/opentelemetry/OpenTelemetryUtils.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! THanks for the documentation and links in your description
} | ||
|
||
private TelemetryAttributes getResponseCodeAttributes(AmqpResponseCode code, String operation) { | ||
int ind = code == null ? RESPONSE_CODES_COUNT - 1 : code.ordinal(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am curious why we have the - 1
here and then + 1
when defining RESPONSE_CODES_COUNT. A comment where RESPONSE_CODES_COUNT could help in the future if someone is checking this out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the feedback, I added comments in the code!
public static final String STATUS_CODE_KEY = "amqpStatusCode"; | ||
public static final String MANAGEMENT_OPERATION_KEY = "amqpOperation"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do these constants have to be public or can they be private
?
// if there was no response, state is null and indicates a network (probably) error. | ||
// we don't have an enum for network issues and metric attributes cannot have arbitrary | ||
// high-cardinality data, so we'll just use vague "error" for it. | ||
int ind = state == null ? DELIVERY_STATES_COUNT - 1 : state.ordinal(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am curious how this will version when we add more values to DeliveryState.DeliveryStateType
. Will we then need to correlate the library version to the ordinal? DELIVERY_STATES_COUNT - 1
could be the vague "error" state in v1, but when we add a new enum value, that would be a valid enum in v2, v1's values may look like valid response states.
* Amqp core metrics
* Amqp core metrics
Spec: https://gist.github.com/lmolkova/489a2b280b8fa68e4c3780c2afaa3b39
Adds following metrics (as step 1):
OpenTelemetry Metrics Semantic Conventions for Azure Messaging Libraries
Context: https://gist.github.com/lmolkova/b9004307a09be788af04f05ebe22ad3c
Follows general OTel Metrics conventions and depends on OTel Messaging conventions changes
AMQP-level metric instruments
ms
net.peer.name
messaging.destination
(will change)amqp.delivery_state
ms
net.peer.name
messaging.destination
(will change)amqp.status_code
amqp.operation
{connections}
net.peer.name
amqp.error_condition
{errors}
net.peer.name
messaging.destination
amqp.error_condition
{errors}
net.peer.name
messaging.destination
messaging.az.entity_path
(when available)amqp.error_condition
{errors}
net.peer.name
messaging.destination
messaging.az.entity_path
(when available)amqp.error_condition
seconds
net.peer.name
messaging.destination
messaging.az.entity_path
{credits}
net.peer.name
messaging.destination
messaging.az.entity_path
Scenarios
messaging.az.amqp.*.duration
messaging.az.amqp.client.connections.closed
messaging.az.amqp.client.*.errors
messaging.az.amqp.consumer.lag
messaging.az.amqp.consumer.credits.requested
all metrics: