diff --git a/CHANGELOG.md b/CHANGELOG.md
index 1f8230c7184..f7a683483cf 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -27,6 +27,9 @@ release.
([#2154](/~https://github.com/open-telemetry/opentelemetry-specification/pull/2154))
- Mark In-memory, OTLP and Stdout exporter specs as Stable.
([#2175](/~https://github.com/open-telemetry/opentelemetry-specification/pull/2175))
+- Add to the supplemental guidelines for metric SDK authors text about implementing
+ attribute-removal Views for asynchronous instruments.
+ ([#2208](/~https://github.com/open-telemetry/opentelemetry-specification/pull/2208))
- Clarify integer count instrument units.
([#2210](/~https://github.com/open-telemetry/opentelemetry-specification/pull/2210))
- Use UCUM units in Metrics Semantic Conventions.
diff --git a/specification/metrics/supplementary-guidelines.md b/specification/metrics/supplementary-guidelines.md
index 3f6475a0e81..c41178846cc 100644
--- a/specification/metrics/supplementary-guidelines.md
+++ b/specification/metrics/supplementary-guidelines.md
@@ -16,6 +16,13 @@ requirements to the existing specifications.
* [Semantic convention](#semantic-convention)
- [Guidelines for SDK authors](#guidelines-for-sdk-authors)
* [Aggregation temporality](#aggregation-temporality)
+ + [Synchronous example](#synchronous-example)
+ - [Synchronous example: Delta aggregation temporality](#synchronous-example-delta-aggregation-temporality)
+ - [Synchronous example: Cumulative aggregation temporality](#synchronous-example-cumulative-aggregation-temporality)
+ + [Asynchronous example](#asynchronous-example)
+ - [Asynchronous example: Cumulative temporality](#asynchronous-example-cumulative-temporality)
+ - [Asynchronous example: Delta temporality](#asynchronous-example-delta-temporality)
+ - [Asynchronous example: attribute removal in a view](#asynchronous-example-attribute-removal-in-a-view)
* [Memory management](#memory-management)
@@ -155,6 +162,8 @@ Conventions`, rather than inventing your own semantics.
### Aggregation temporality
+#### Synchronous example
+
The OpenTelemetry Metrics [Data Model](./datamodel.md) and [SDK](./sdk.md) are
designed to support both Cumulative and Delta
[Temporality](./datamodel.md#temporality). It is important to understand that
@@ -177,6 +186,13 @@ following HTTP requests example:
* verb = `GET`, status = `200`, duration = `30 (ms)`
* verb = `GET`, status = `200`, duration = `50 (ms)`
+Note that in the following examples, Delta aggregation temporality is
+discussed before Cumulative aggregation temporality because
+synchronous Counter and UpDownCounter measurements are input to the
+API with specified Delta aggregation temporality.
+
+##### Synchronous example: Delta aggregation temporality
+
Let's imagine we export the metrics as [Histogram](./datamodel.md#histogram),
and to simplify the story we will only have one histogram bucket `(-Inf, +Inf)`:
@@ -204,6 +220,8 @@ latest collection/export cycle**. For example, when the SDK started to process
measurements in (T1, T2], it can completely forget about
what has happened during (T0, T1].
+##### Synchronous example: Cumulative aggregation temporality
+
If we export the metrics using **Cumulative Temporality**:
* (T0, T1]
@@ -259,6 +277,8 @@ So here are some suggestions that we encourage SDK implementers to consider:
stream hasn't received any updates for a long period of time, would it be okay
to reset the start time?
+#### Asynchronous example
+
In the above case, we have Measurements reported by a [Histogram
Instrument](./api.md#histogram). What if we collect measurements from an
[Asynchronous Counter](./api.md#asynchronous-counter)?
@@ -283,6 +303,13 @@ thread ever started:
* thread 1 died, thread 3 started
* pid = `1001`, tid = `2`, #PF = `53`
* pid = `1001`, tid = `3`, #PF = `5`
+
+Note that in the following examples, Cumulative aggregation
+temporality is discussed before Delta aggregation temporality because
+asynchronous Counter and UpDownCounter measurements are input to the
+API with specified Cumulative aggregation temporality.
+
+##### Asynchronous example: Cumulative temporality
If we export the metrics using **Cumulative Temporality**:
@@ -302,11 +329,45 @@ If we export the metrics using **Cumulative Temporality**:
* attributes: {pid = `1001`, tid = `2`}, sum: `53`
* attributes: {pid = `1001`, tid = `3`}, sum: `5`
-It is quite straightforward - we just take the data being reported from the
-asynchronous instruments and send them. We might want to consider if [Resets and
-Gaps](./datamodel.md#resets-and-gaps) should be used to denote the end of a
-metric stream - e.g. thread 1 died, the thread ID might be reused by the
-operating system, and we probably don't want to confuse the metrics backend.
+The behavior in the first four periods is quite straightforward - we
+just take the data being reported from the asynchronous instruments
+and send them.
+
+The data model prescribes several valid behaviors at T5 in
+this case, where one stream dies and another starts. The [Resets and
+Gaps](./datamodel.md#resets-and-gaps) section describes how start
+timestamps and staleness markers can be used to increase the
+receiver's understanding of these events.
+
+Consider whether the SDK maintains individual timestamps for the
+individual stream, or just one per process. In this example, where a
+thread can die and start counting page faults from zero, the valid
+behaviors at T5 are:
+
+1. If all streams in the process share a start time, and the SDK is
+ not required to remember all past streams: the thread restarts with
+ zero sum. Receivers with reset detection are able to calculate a
+ correct rate (except for frequent restarts relative to the
+ collection interval), however the precise time of a reset will be
+ unknown.
+2. If the SDK maintains per-stream start times, it signals to the
+ receiver precisely when a stream started, making the first
+ observation in a stream more useful for diagnostics. Receivers can
+ perform overlap detection or duplicate suppression and do not
+ require reset detection, in this case.
+3. Independent of above treatments, the SDK can add a staleness marker
+ to indicate the start of a gap in the stream when one thread dies
+ by remembering which streams have previously reported but are not
+ currently reporting. If per-stream start timestamps are used,
+ staleness markers can be issued to precisely start a gap in the
+ stream and permit forgetting streams that have stopped reporting.
+
+It's OK to ignore the options to use per-stream start timestamps and
+staleness markers. The first course of action above requires no
+additional memory or code to achieve and is correct in terms of the
+data model.
+
+##### Asynchronous example: Delta temporality
If we export the metrics using **Delta Temporality**:
@@ -351,6 +412,71 @@ So here are some suggestions that we encourage SDK implementers to consider:
rather than drop the data on the floor, you might want to convert them to
something useful - e.g. [Gauge](./datamodel.md#gauge).
+##### Asynchronous example: attribute removal in a view
+
+Suppose the metrics in the asynchronous example above are exported
+through a view configured to remove the `tid` attribute, leaving a
+single-dimensional count of page faults by `pid`. For each metric
+stream, two measurements are produced covering the same interval of
+time, which the SDK is expected to aggregate before producing the
+output.
+
+The data model specifies to use the "natural merge" function, in this
+case meaning to add the current point values together because they
+are `Sum` data points. The expected output is, still in **Cumulative
+Temporality**:
+
+* (T0, T1]
+ * dimensions: {pid = `1001`}, sum: `80`
+* (T0, T2]
+ * dimensions: {pid = `1001`}, sum: `91`
+* (T0, T3]
+ * dimensions: {pid = `1001`}, sum: `98`
+* (T0, T4]
+ * dimensions: {pid = `1001`}, sum: `107`
+* (T0, T5]
+ * dimensions: {pid = `1001`}, sum: `58`
+
+As discussed in the asynchronous cumulative temporality example above,
+there are various treatments available for detecting resets. Even if
+the first course is taken, which means doing nothing, a receiver that
+follows the data model's rules for [unknown start
+time](datamodel.md#cumulative-streams-handling-unknown-start-time) and
+[inserting true start
+times](datamodel.md#cumulative-streams-inserting-true-reset-points)
+will calculate a correct rate in this case. The "58" received at
+T5 resets the stream - the change from "107" to "58" will
+register as a gap and rate calculations will resume correctly at
+T6. The rules for reset handling are provided so that the
+unknown portion of "58" that was counted reflected in the "107" at
+T4 is not double-counted at T5 in the reset.
+
+If the option to use per-stream start timestamps is taken above, it
+lightens the duties of the receiver, making it possible to monitor
+gaps precisely and detect overlapping streams. When per-stream state
+is available, the SDK has several approaches for calculating Views
+available in the presence of attributes that stop reporting and then
+reset some time later:
+
+1. By remembering the cumulative value for all streams across the
+ lifetime of the process, the cumulative sum will be correct despite
+ `attributes` that come and go. The SDK has to detect per-stream resets
+ itself in this case, otherwise the View will be calculated incorrectly.
+2. When the cost of remembering all streams `attributes` becomes too
+ high, reset the View and all its state, give it a new start
+ timestamp, and let the caller see a a gap in the stream.
+
+When considering this matter, note also that the metrics API has a
+recommendation for each asynchronous instrument: [User code is
+recommended not to provide more than one `Measurement` with the same
+`attributes` in a single callback.](api.md#instrument). Consider
+whether the impact of user error in this regard will impact the
+correctness of the view. When maintaining per-stream state for the
+purpose of View correctness, SDK authors may want to consider
+detecting when the user makes duplicate measurements. Without
+checking for duplicate measurements, Views may be calculated
+incorrectly.
+
### Memory management
Memory management is a wide topic, here we will only cover some of the most