-
Notifications
You must be signed in to change notification settings - Fork 6
Prometheus monitoring
When using NPBackup client, you may want to get metrics for prometheus.
NPBackup has two ways of creating metrics:
- metrics file: Used on servers with node_exporter installed
- push gateway: Used on clients without node_exporter
In the configuration, add a file path to the destination
field in the global_prometheus
section.
Example:
destination: /var/lib/node_exporter/textfile_collector/npbackup.prom
On every NPBackup run, the above file will be created with prometheus metrics.
These files can be picked up by node_exporter
if it has the textfile
collector configured via argument --collector.textfile.directory=/var/lib/node_exporter/textfile_collector
In the configuration, add an URI to your Prometheus Push Gateway, in the following form
https://push.mydomain.tld/metrics/job/${BACKUP_JOB}
The variable ${BACKUP_JOB}
is populated from the prometheus
section of a repo or group, and defaults to the ${MACHINE_ID}
variable which comes from the identity
section. Of course, you can override any of those variables with whatever you want.
Note: Using https is out of the scope of this wiki. Usually, this is done by using a https proxy like Haproxy.
NPBackup parses restic output to create the following metrics when using backup function:
-
restic_files{instance="",backup_job="",state="",action="backup"}
- States:
new
,changed
,unmodified
, andtotal
- States:
- 'restic_dirs{instance="",backup_job="",state="",action="backup"}`
- States:
new
,changed
,unmodified
- States:
restic_snasphot_size_bytes{instance="",backup_job="",action="backup",type="processed"}
restic_total_duration_seconds{instance="",backup_job="",action="backup"}
Additionally, NPBackup creates the following metrics itself for every run action:
-
npbackup_exec_state{npversion="npbackup3.0.0-rc13-pub",instance="",backup_job="",action="",repo_name="",timestamp=""}
- Metric value is the execution state
- 0: Ok
- 1: Warnings
- 2: Errros
- 3: Critical error
- Metric value is the execution state
-
npbackup_exec_time{action="",repo_name="",timestamp=""}
- Metric value the execution time in seconds
Valid actions are init
, backup
, has_recent_snapshot
, snapshots
, stats
, ls
, find
, restore
, dump
, check
, recover
, list
, unlock
, repair
, forget
, housekeeping
, prune
, raw
The configuration allows to add trivial labels to prometheus metrics.
The following example:
global_prometheus:
metrics: true
additional_labels:
host_type: hypervisor
backup_type: baremetal
Will lead to the creation of metrics that look like:
npbackup_exec_state{npversion="npbackup3.0.0-rc13-pub",instance="somehost",backup_job="somehost",host_type="hypervisor",backup_type="baremetal",action="upgrade",repo_name="default",timestamp="1736882285"} 0
npbackup_exec_time{npversion="npbackup3.0.0-rc13-pub",instance="somehost",backup_job="somehost",host_type="hypervisor",backup_type="baremetal",action="snapshots",repo_name="default",timestamp="1736882285"} 0.0
There is an example Grafana dashboard in examples
directory, that has been tested with Grafana v10+.