Status | |
---|---|
Stability | beta: metrics |
Distributions | core, contrib |
Issues | |
Code Owners | @Aneurysm9, @rapphil, @dashpole, @ArthurSens |
Prometheus Remote Write Exporter sends OpenTelemetry metrics to Prometheus remote write compatible backends such as Cortex, Mimir, and Thanos. By default, this exporter requires TLS and offers queued retry capabilities.
A design doc is available to document in detail how this exporter works.
The following settings are required:
endpoint
(no default): The remote write URL to send remote write samples.
By default, TLS is enabled and must be configured under tls:
:
insecure
(default =false
): whether to enable client transport security for the exporter's connection.
As a result, the following parameters are also required under tls:
:
cert_file
(no default): path to the TLS cert to use for TLS required connections. Should only be used ifinsecure
is set to false.key_file
(no default): path to the TLS key to use for TLS required connections. Should only be used ifinsecure
is set to false.
The following settings can be optionally configured:
external_labels
: map of labels names and values to be attached to each metric data pointheaders
: additional headers attached to each HTTP request.- Note the following headers cannot be changed:
Content-Encoding
,Content-Type
,X-Prometheus-Remote-Write-Version
, andUser-Agent
.
- Note the following headers cannot be changed:
namespace
: prefix attached to each exported metric name.add_metric_suffixes
: If set to false, type and unit suffixes will not be added to metrics. Default: true.send_metadata
: If set to true, prometheus metadata will be generated and sent. Default: false.remote_write_queue
: fine tuning for queueing and sending of the outgoing remote writes.enabled
: enable the sending queue (default:true
)queue_size
: number of OTLP metrics that can be queued. Ignored ifenabled
isfalse
(default:10000
)num_consumers
: minimum number of workers to use to fan out the outgoing requests. (default:5
or default:1
ifEnableMultipleWorkersFeatureGate
is enabled).
resource_to_telemetry_conversion
enabled
(default = false): Ifenabled
istrue
, all the resource attributes will be converted to metric labels by default.
target_info
: customizetarget_info
metricenabled
(default = true): Ifenabled
istrue
, atarget_info
metric will be generated for each resource metric (see open-telemetry/opentelemetry-specification#2381).
export_created_metric
:WARNING
Deprecated and planned for removal in v0.116.0. See related issue for more information.enabled
(default = false): Ifenabled
istrue
, a_created
metric is exported for Summary, Histogram, and Monotonic Sum metric points ifStartTimeUnixNano
is set.
max_batch_size_bytes
(default =3000000
->~2.861 mb
): Maximum size of a batch of samples to be sent to the remote write endpoint. If the batch size is larger than this value, it will be split into multiple batches.max_batch_request_parallelism
(default =5
): Maximum parallelism allowed for a single request bigger thanmax_batch_size_bytes
.
Example:
exporters:
prometheusremotewrite:
endpoint: "https://my-cortex:7900/api/v1/push"
wal: # Enabling the Write-Ahead-Log for the exporter.
directory: ./prom_rw # The directory to store the WAL in
buffer_size: 100 # Optional count of elements to be read from the WAL before truncating; default of 300
truncate_frequency: 45s # Optional frequency for how often the WAL should be truncated. It is a time.ParseDuration; default of 1m
resource_to_telemetry_conversion:
enabled: true # Convert resource attributes to metric labels
Example:
exporters:
prometheusremotewrite:
endpoint: "https://my-cortex:7900/api/v1/push"
external_labels:
label_name1: label_value1
label_name2: label_value2
Several helper files are leveraged to provide additional capabilities automatically:
- HTTP settings
- TLS and mTLS settings
- Retry and timeout settings, note that the exporter doesn't support
sending_queue
but providesremote_write_queue
.
This exporter has feature gate: exporter.prometheusremotewritexporter.RetryOn429
.
When this feature gate is enable the prometheus remote write exporter will retry on 429 http status code with the provided retry configuration.
It currently doesn't support respecting the http header Retry-After
if provided since the retry library used doesn't support this feature.
To enable it run collector with enabled feature gate exporter.prometheusremotewritexporter.RetryOn429
. This can be done by executing it with one additional parameter - --feature-gates=telemetry.useOtelForInternalMetrics
.
This exporter has feature gate: +exporter.prometheusremotewritexporter.EnableMultipleWorkers
.
When this feature gate is enabled, num_consumers
will be used as the worker counter for handling batches from the queue, and max_batch_request_parallelism
will be used for parallelism on single batch bigger than max_batch_size_bytes
.
Enabling this feature gate, with num_consumers
higher than 1 requires the target destination to supports ingestion of OutOfOrder samples. See Multiple Consumers and OutOfOrder for more info
OpenTelemetry metric names and attributes are normalized to be compliant with Prometheus naming rules. Details on this normalization process are described in the Prometheus translator module.
By default, resource attributes are added to a special metric called target_info
. To select and group by metrics by resource attributes, you need to do join on target_info
. For example, to select metrics with k8s_namespace_name
attribute equal to my-namespace
:
app_ads_ad_requests_total * on (job, instance) group_left target_info{k8s_namespace_name="my-namespace"}
Or to group by a particular attribute (for ex. k8s_namespace_name
):
sum by (k8s_namespace_name) (app_ads_ad_requests_total * on (job, instance) group_left(k8s_namespace_name) target_info)
This is not a common pattern, and we recommend copying the most common resource attributes into metric labels. You can do this through the transform processor:
processor:
transform:
metric_statements:
- context: datapoint
statements:
- set(attributes["namespace"], resource.attributes["k8s.namespace.name"])
- set(attributes["container"], resource.attributes["k8s.container.name"])
- set(attributes["pod"], resource.attributes["k8s.pod.name"])
After this, grouping or selecting becomes as simple as:
app_ads_ad_requests_total{namespace="my-namespace"}
sum by (namespace) (app_ads_ad_requests_total)
DISCLAIMER: This snippet applies only to Prometheus, other remote write destinations using Prometheus Protocol (ex: Thanos/Grafana Mimir/VictoriaMetrics) may have different settings.
By default, Prometheus expects samples to be ingested sequentially, in temporal order.
When multiple consumers are enabled, the temporal ordering of the samples written to the target destination is not deterministic, and temporal ordering can no longer be guaranteed. For example, one worker may push a sample for t+30s
, and a second worker may push an additional sample but for t+15s
.
Vanilla Prometheus configurations will reject these unordered samples and you'll receive "out of order" errors.
Out-of-order support in Prometheus must be enabled for multiple consumers.
This can be done by using the tsdb.out_of_order_time_window: 10m
settings. Please choose an appropriate time window to support pushing the worst-case scenarios of a "queue" build-up on the sender side.
See for more info: