From 1997dd1011b1b02de4f9b59f761e0c69bddbf2e2 Mon Sep 17 00:00:00 2001 From: Joshua MacDonald Date: Mon, 8 May 2023 10:10:59 -0700 Subject: [PATCH] Specify MeterProvider configurable cardinality limits (#2960) Fixes #1891. **EDIT: Updated to specify cardinality limits at the View/Instrument level with a Reader-level default. Updated to use a hard limit** ## Changes Adds optional support for a maximum cardinality limit. The recommended default is 2000, based on this comment by /~https://github.com/open-telemetry/opentelemetry-specification/issues/1891#issuecomment-967672167 @jack-berg. ~The Prometheus-WG SIG discussed this on Nov 9, 2022 and reached this recommended solution to the problem outlined in #1891. The consequence of exceeding these limits is in line with the current Prometheus server behavior, which drops targets that misbehave. The discussed was summarized here: /~https://github.com/open-telemetry/opentelemetry-specification/issues/1891#issuecomment-1309089908~ --- CHANGELOG.md | 3 ++ specification/metrics/sdk.md | 66 ++++++++++++++++++++++++++++++++++++ 2 files changed, 69 insertions(+) diff --git a/CHANGELOG.md b/CHANGELOG.md index 1b5f1f7b9d5..ac26e0fac4f 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -15,6 +15,9 @@ release. - Add experimental histogram advice API. ([#3216](/~https://github.com/open-telemetry/opentelemetry-specification/pull/3216)) +- Recommended cardinality limits to protect metrics pipelines against + excessive data production from a single instrument. + ([#2960](/~https://github.com/open-telemetry/opentelemetry-specification/pull/2960)) - Specify second unit (`s`) and advice bucket boundaries of `[]` for `process.runtime.jvm.gc.duration`. ([#3458](/~https://github.com/open-telemetry/opentelemetry-specification/pull/3458)) diff --git a/specification/metrics/sdk.md b/specification/metrics/sdk.md index 3600e1afada..f77c64db612 100644 --- a/specification/metrics/sdk.md +++ b/specification/metrics/sdk.md @@ -30,6 +30,9 @@ linkTitle: SDK * [Use the maximum scale for single measurements](#use-the-maximum-scale-for-single-measurements) * [Maintain the ideal scale](#maintain-the-ideal-scale) * [Observations inside asynchronous callbacks](#observations-inside-asynchronous-callbacks) + * [Cardinality limits](#cardinality-limits) + + [Synchronous instrument cardinality limits](#synchronous-instrument-cardinality-limits) + + [Asynchronous instrument cardinality limits](#asynchronous-instrument-cardinality-limits) - [Meter](#meter) * [Duplicate instrument registration](#duplicate-instrument-registration) * [Instrument name](#instrument-name) @@ -235,6 +238,12 @@ are the inputs: `exemplar_reservoir` (optional) to use for storing exemplars. This should be a factory or callback similar to aggregation which allows different reservoirs to be chosen by the aggregation. + * **Status**: [Experimental](../document-status.md) - the + `aggregation_cardinality_limit` (optional) associated with the view. This + should be a positive integer to be taken as a hard limit on the + number of data points that will be emitted during a single + collection by a single instrument. See [cardinality limits](#cardinality-limits), + below. In order to avoid conflicts, views which specify a name SHOULD have an instrument selector that selects at most one instrument. For the registration @@ -582,6 +591,62 @@ execution. The implementation MUST complete the execution of all callbacks for a given instrument before starting a subsequent round of collection. +### Cardinality limits + +**Status**: [Experimental](../document-status.md) + +Views SHOULD support being configured with a cardinality limit to be +applied to all aggregators not configured by a specific view, specified +via `MetricReader` configuration. + +View configuration SHOULD support applying per-aggregation cardinality limits. + +The cardinality limit is taken as an exact, hard limit on the number +of data points that can be written per collection, per aggregation. +Each aggregation configured view MUST NOT output more than the +configured `aggregation_cardinality_limit` number of data points per +period. + +The RECOMMENDED default aggregation cardinality limit is 2000. + +An overflow attribute set is defined, containing a single attribute +`otel.metric.overflow` having (boolean) value `true`, which is used to +report a synthetic aggregation of the metric events that could not be +independently aggregated because of the limit. + +The SDK MUST create an Aggregator with the overflow attribute set +prior to reaching the cardinality limit and use it to aggregate events +for which the correct Aggregator could not be created. The maximum +number of distinct, non-overflow attributes is one less than the +limit, as a result. + +#### Synchronous instrument cardinality limits + +Views of synchronous instruments with cumulative aggregation +temporality MUST continue to export the all attribute sets that were +observed prior to the beginning of overflow. Metric events +corresponding with attribute sets that were not observed prior to the +overflow will be reflected in a single data point described by (only) +the overflow attribute. + +Views of synchronous instruments with delta aggregation temporality +MAY choose an arbitrary subset of attribute sets to output to maintain +the stated cardinality limit. + +Regardless of aggregation temporality, the SDK MUST ensure that every +metric event is reflected in exactly one Aggregator, which is either +an Aggregator associated with the correct attribute set or an +aggregator associated with the overflow attribute set. + +Events MUST NOT be double-counted or dropped during an +overflow. + +#### Asynchronous instrument cardinality limits + +Views of asynchronous instruments SHOULD prefer the first-observed +attributes in the callback when limiting cardinality, regardless of +aggregation temporality. + ## Meter Distinct meters MUST be treated as separate namespaces for the purposes of detecting @@ -862,6 +927,7 @@ SHOULD provide at least the following: * The `exporter` to use, which is a `MetricExporter` instance. * The default output `aggregation` (optional), a function of instrument kind. If not configured, the [default aggregation](#default-aggregation) SHOULD be used. * The default output `temporality` (optional), a function of instrument kind. If not configured, the Cumulative temporality SHOULD be used. +* The default aggregation cardinality limit to use, a function of instrument kind. If not configured, a default value of 2000 SHOULD be used. The [MetricReader.Collect](#collect) method allows general-purpose `MetricExporter` instances to explicitly initiate collection, commonly