Skip to content

Latest commit

 

History

History
178 lines (130 loc) · 10.5 KB

File metadata and controls

178 lines (130 loc) · 10.5 KB

Prometheus Remote Write Exporter

Status
Stability beta: metrics
Distributions core, contrib
Issues Open issues Closed issues
Code Owners @Aneurysm9, @rapphil, @dashpole, @ArthurSens

Prometheus Remote Write Exporter sends OpenTelemetry metrics to Prometheus remote write compatible backends such as Cortex, Mimir, and Thanos. By default, this exporter requires TLS and offers queued retry capabilities.

⚠️ Non-cumulative monotonic, histogram, and summary OTLP metrics are dropped by this exporter.

A design doc is available to document in detail how this exporter works.

Getting Started

The following settings are required:

  • endpoint (no default): The remote write URL to send remote write samples.

By default, TLS is enabled and must be configured under tls::

  • insecure (default = false): whether to enable client transport security for the exporter's connection.

As a result, the following parameters are also required under tls::

  • cert_file (no default): path to the TLS cert to use for TLS required connections. Should only be used if insecure is set to false.
  • key_file (no default): path to the TLS key to use for TLS required connections. Should only be used if insecure is set to false.

The following settings can be optionally configured:

  • external_labels: map of labels names and values to be attached to each metric data point
  • headers: additional headers attached to each HTTP request.
    • Note the following headers cannot be changed: Content-Encoding, Content-Type, X-Prometheus-Remote-Write-Version, and User-Agent.
  • namespace: prefix attached to each exported metric name.
  • add_metric_suffixes: If set to false, type and unit suffixes will not be added to metrics. Default: true.
  • send_metadata: If set to true, prometheus metadata will be generated and sent. Default: false.
  • remote_write_queue: fine tuning for queueing and sending of the outgoing remote writes.
    • enabled: enable the sending queue (default: true)
    • queue_size: number of OTLP metrics that can be queued. Ignored if enabled is false (default: 10000)
    • num_consumers: minimum number of workers to use to fan out the outgoing requests. (default: 5 or default: 1 if EnableMultipleWorkersFeatureGate is enabled).
  • resource_to_telemetry_conversion
    • enabled (default = false): If enabled is true, all the resource attributes will be converted to metric labels by default.
  • target_info: customize target_info metric
  • export_created_metric: WARNING Deprecated and planned for removal in v0.116.0. See related issue for more information.
    • enabled (default = false): If enabled is true, a _created metric is exported for Summary, Histogram, and Monotonic Sum metric points if StartTimeUnixNano is set.
  • max_batch_size_bytes (default = 3000000 -> ~2.861 mb): Maximum size of a batch of samples to be sent to the remote write endpoint. If the batch size is larger than this value, it will be split into multiple batches.
  • max_batch_request_parallelism (default = 5): Maximum parallelism allowed for a single request bigger than max_batch_size_bytes.

Example:

exporters:
  prometheusremotewrite:
    endpoint: "https://my-cortex:7900/api/v1/push"
    wal: # Enabling the Write-Ahead-Log for the exporter.
      directory: ./prom_rw # The directory to store the WAL in
      buffer_size: 100 # Optional count of elements to be read from the WAL before truncating; default of 300
      truncate_frequency: 45s # Optional frequency for how often the WAL should be truncated. It is a time.ParseDuration; default of 1m
    resource_to_telemetry_conversion:
      enabled: true # Convert resource attributes to metric labels

Example:

exporters:
  prometheusremotewrite:
    endpoint: "https://my-cortex:7900/api/v1/push"
    external_labels:
      label_name1: label_value1
      label_name2: label_value2

Advanced Configuration

Several helper files are leveraged to provide additional capabilities automatically:

Feature gates

RetryOn429

This exporter has feature gate: exporter.prometheusremotewritexporter.RetryOn429. When this feature gate is enable the prometheus remote write exporter will retry on 429 http status code with the provided retry configuration. It currently doesn't support respecting the http header Retry-After if provided since the retry library used doesn't support this feature.

To enable it run collector with enabled feature gate exporter.prometheusremotewritexporter.RetryOn429. This can be done by executing it with one additional parameter - --feature-gates=telemetry.useOtelForInternalMetrics.

EnableMultipleWorkersFeatureGate

This exporter has feature gate: +exporter.prometheusremotewritexporter.EnableMultipleWorkers.

When this feature gate is enabled, num_consumers will be used as the worker counter for handling batches from the queue, and max_batch_request_parallelism will be used for parallelism on single batch bigger than max_batch_size_bytes. Enabling this feature gate, with num_consumers higher than 1 requires the target destination to supports ingestion of OutOfOrder samples. See Multiple Consumers and OutOfOrder for more info

Metric names and labels normalization

OpenTelemetry metric names and attributes are normalized to be compliant with Prometheus naming rules. Details on this normalization process are described in the Prometheus translator module.

Setting resource attributes as metric labels

By default, resource attributes are added to a special metric called target_info. To select and group by metrics by resource attributes, you need to do join on target_info. For example, to select metrics with k8s_namespace_name attribute equal to my-namespace:

app_ads_ad_requests_total * on (job, instance) group_left target_info{k8s_namespace_name="my-namespace"}

Or to group by a particular attribute (for ex. k8s_namespace_name):

sum by (k8s_namespace_name) (app_ads_ad_requests_total * on (job, instance) group_left(k8s_namespace_name) target_info)

This is not a common pattern, and we recommend copying the most common resource attributes into metric labels. You can do this through the transform processor:

processor:
  transform:
    metric_statements:
      - context: datapoint
        statements:
        - set(attributes["namespace"], resource.attributes["k8s.namespace.name"])
        - set(attributes["container"], resource.attributes["k8s.container.name"])
        - set(attributes["pod"], resource.attributes["k8s.pod.name"])

After this, grouping or selecting becomes as simple as:

app_ads_ad_requests_total{namespace="my-namespace"}

sum by (namespace) (app_ads_ad_requests_total)

Multiple Consumers and OutOfOrder

DISCLAIMER: This snippet applies only to Prometheus, other remote write destinations using Prometheus Protocol (ex: Thanos/Grafana Mimir/VictoriaMetrics) may have different settings.

By default, Prometheus expects samples to be ingested sequentially, in temporal order.

When multiple consumers are enabled, the temporal ordering of the samples written to the target destination is not deterministic, and temporal ordering can no longer be guaranteed. For example, one worker may push a sample for t+30s, and a second worker may push an additional sample but for t+15s.

Vanilla Prometheus configurations will reject these unordered samples and you'll receive "out of order" errors.

Out-of-order support in Prometheus must be enabled for multiple consumers. This can be done by using the tsdb.out_of_order_time_window: 10m settings. Please choose an appropriate time window to support pushing the worst-case scenarios of a "queue" build-up on the sender side.

See for more info: