Prometheus monitoring

Why

When using NPBackup client, you may want to get metrics for prometheus.
NPBackup has two ways of creating metrics:

metrics file: Used on servers with node_exporter installed
push gateway: Used on clients without node_exporter

Metrics file

In the configuration, add a file path to the destination field in the global_prometheus section. Example:

destination: /var/lib/node_exporter/textfile_collector/npbackup.prom

On every NPBackup run, the above file will be created with prometheus metrics.
These files can be picked up by node_exporter if it has the textfile collector configured via argument --collector.textfile.directory=/var/lib/node_exporter/textfile_collector

Push gateway

In the configuration, add an URI to your Prometheus Push Gateway, in the following form

https://push.mydomain.tld/metrics/job/${BACKUP_JOB}

The variable ${BACKUP_JOB} is populated from the prometheus section of a repo or group, and defaults to the ${MACHINE_ID} variable which comes from the identity section. Of course, you can override any of those variables with whatever you want.

Note: Using https is out of the scope of this wiki. Usually, this is done by using a https proxy like Haproxy.

Produced metrics

NPBackup parses restic output to create the following metrics when using backup function:

restic_files{instance="",backup_job="",state="",action="backup"}
- States: new, changed, unmodified, and total
'restic_dirs{instance="",backup_job="",state="",action="backup"}`
- States: new, changed, unmodified
restic_snasphot_size_bytes{instance="",backup_job="",action="backup",type="processed"}
restic_total_duration_seconds{instance="",backup_job="",action="backup"}

Additionally, NPBackup creates the following metrics itself for every run action:

npbackup_exec_state{npversion="npbackup3.0.0-rc13-pub",instance="",backup_job="",action="",repo_name="",timestamp=""}
- Metric value is the execution state
  - 0: Ok
  - 1: Warnings
  - 2: Errros
  - 3: Critical error
npbackup_exec_time{action="",repo_name="",timestamp=""}
- Metric value the execution time in seconds

Valid actions are init, backup, has_recent_snapshot, snapshots, stats, ls, find, restore, dump, check, recover, list, unlock, repair, forget, housekeeping, prune, raw

Additional labels

The configuration allows to add trivial labels to prometheus metrics.
The following example:

global_prometheus:
  metrics: true
  additional_labels:
    host_type: hypervisor
    backup_type: baremetal

Will lead to the creation of metrics that look like:

npbackup_exec_state{npversion="npbackup3.0.0-rc13-pub",instance="somehost",backup_job="somehost",host_type="hypervisor",backup_type="baremetal",action="upgrade",repo_name="default",timestamp="1736882285"} 0
npbackup_exec_time{npversion="npbackup3.0.0-rc13-pub",instance="somehost",backup_job="somehost",host_type="hypervisor",backup_type="baremetal",action="snapshots",repo_name="default",timestamp="1736882285"} 0.0

Grafana Dashboard

There is an example Grafana dashboard in examples directory, that has been tested with Grafana v10+.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly