Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add build_info metric and use it in generated queries #35

Merged
merged 26 commits into from
May 11, 2023
Merged
Show file tree
Hide file tree
Changes from 19 commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
93abf6b
Add blurb to readme about identifying commits
brettimus May 10, 2023
49f6d90
Remove "coming soon" from readme item on adding links to live Prom ch…
brettimus May 10, 2023
41d7649
Add WIP section to the readme on using build info (version/commit)
brettimus May 10, 2023
cd38170
Add constants
brettimus May 10, 2023
3d5cc9c
Initialize Prometheus Gauge for build_info
brettimus May 10, 2023
3cee169
Add updown counter for build info to otel tracker
brettimus May 10, 2023
936ed06
Implement set_build_info for OTEL and Prom, and call when we set the …
brettimus May 10, 2023
44ac3c5
Update README
brettimus May 10, 2023
2d21708
Move set_build_info call into create_tracker
May 10, 2023
0d7db6e
Update prometheus queries
May 10, 2023
02ad205
Update prometheus URL tests
May 10, 2023
9a3b55a
Add test for build_info gauge for prometheus tracker (skipped test fo…
May 10, 2023
cd74486
Checkpoint (saving work for later reference)
May 10, 2023
f20e973
Update otel tracker and tracker tests after finding otel prometheus bug
May 10, 2023
c362620
Update env vars we mention in README (still needs more thought)
May 10, 2023
034406a
Ensure set_build_info is only called once
May 10, 2023
b1eef93
Fix "call once" logic of set_build_info
brettimus May 10, 2023
a071249
Remove commented out test
brettimus May 10, 2023
5bcc0ba
Update changelog
brettimus May 10, 2023
894c224
Remove unnecessary commented-out print statements
brettimus May 10, 2023
b58510f
Add set_build_info to the TrackMetrics Protocol
brettimus May 10, 2023
7163175
Fix build_info query based off of autometrics-dev/autometrics-shared#8
brettimus May 10, 2023
c44a621
Rename create_tracker to init_tracker
May 11, 2023
c916061
Use types instead of ifs
May 11, 2023
b754103
Update pyright
May 11, 2023
4cb6cd4
Update README to mention OpenTelemetry tracker does not work with bui…
May 11, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/).

### Added

- Support for build_info metrics in Prometheus (#35)
- OpenTelemetry Support (#28)
- Fly.io example (#26)
- Django example (#22)
Expand Down
15 changes: 14 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,8 @@ See [Why Autometrics?](/~https://github.com/autometrics-dev#why-autometrics) for m
most useful metrics
- 💡 Writes Prometheus queries so you can understand the data generated without
knowing PromQL
- 🔗 Create links to live Prometheus charts directly into each functions docstrings (with tooltips coming soon!)
- 🔗 Create links to live Prometheus charts directly into each function's docstring
- [🔍 Identify commits](#identifying-commits-that-introduced-problems) that introduced errors or increased latency
- [🚨 Define alerts](#alerts--slos) using SLO best practices directly in your source code
- [📊 Grafana dashboards](#dashboards) work out of the box to visualize the performance of instrumented functions & SLOs
- [⚙️ Configurable](#metrics-libraries) metric collection library (`opentelemetry`, `prometheus`, or `metrics`)
Expand Down Expand Up @@ -114,6 +115,16 @@ Configure the crate that autometrics will use to produce metrics by using one of
- `opentelemetry` - (enabled by default, can also be explicitly set using the AUTOMETRICS_TRACKER="OPEN_TELEMETERY" env var) uses
- `prometheus` -(using the AUTOMETRICS_TRACKER env var set to "PROMETHEUS")

## Identifying commits that introduced problems

> This follows the method outlined in [Exposing the software version to Prometheus](https://www.robustperception.io/exposing-the-software-version-to-prometheus/).

Autometrics makes it easy to identify if a specific version or commit introduced errors or increased latencies.

It uses a separate metric (`build_info`) to track the version and, optionally, git commit of your service. It then writes queries that group metrics by the `version` and `commit` labels so you can spot correlations between those and potential issues.

The `version` is read from the `AUTOMETRICS_VERSION` environment variable, and the `commit` value uses the environment variable `AUTOMETRICS_COMMIT`.

## Development of the package

This package uses [poetry](https://python-poetry.org) as a package manager, with all dependencies separated into three groups:
Expand Down Expand Up @@ -149,4 +160,6 @@ poetry run black .
poetry run pyright
# Run the tests using pytest
poetry run pytest
# Run a single test, and clear the cache
poetry run pytest --cache-clear -k test_tracker
```
7 changes: 7 additions & 0 deletions src/autometrics/constants.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,17 +2,24 @@

COUNTER_NAME = "function.calls.count"
HISTOGRAM_NAME = "function.calls.duration"
# NOTE - The Rust implementation does not use `build.info`, instead opts for just `build_info`
BUILD_INFO_NAME = "build_info"

COUNTER_NAME_PROMETHEUS = COUNTER_NAME.replace(".", "_")
HISTOGRAM_NAME_PROMETHEUS = HISTOGRAM_NAME.replace(".", "_")

COUNTER_DESCRIPTION = "Autometrics counter for tracking function calls"
HISTOGRAM_DESCRIPTION = "Autometrics histogram for tracking function call duration"
BUILD_INFO_DESCRIPTION = (
"Autometrics info metric for tracking software version and build details"
)

# The following constants are used to create the labels
OBJECTIVE_NAME = "objective.name"
OBJECTIVE_PERCENTILE = "objective.percentile"
OBJECTIVE_LATENCY_THRESHOLD = "objective.latency_threshold"
VERSION_KEY = "version"
COMMIT_KEY = "commit"

# The values are updated to use underscores instead of periods to avoid issues with prometheus.
# A similar thing is done in the rust library, which supports multiple exporters
Expand Down
8 changes: 5 additions & 3 deletions src/autometrics/prometheus_url.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,8 @@
from typing import Optional
from dotenv import load_dotenv

ADD_BUILD_INFO_LABELS = "* on (instance, job) group_left(version, commit) build_info"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The grafana dashboards use a slightly different query:

* on(instance, job) group_left(version, commit) (
    last_over_time(
      build_info[1s]
    ) or on (instance, job) up
  )

Perhaps we should do this here as well? Here's the PR with the related change on the autometrics-shared repo: autometrics-dev/autometrics-shared#8

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah good catch! i updated the queries + tests



def cleanup_url(url: str) -> str:
"""Remove the trailing slash if there is one."""
Expand All @@ -26,9 +28,9 @@ def __init__(

def create_urls(self):
"""Create the prometheus query urls for the function and module."""
request_rate_query = f'sum by (function, module) (rate (function_calls_count_total{{function="{self.function_name}",module="{self.module_name}"}}[5m]))'
latency_query = f'sum by (le, function, module) (rate(function_calls_duration_bucket{{function="{self.function_name}",module="{self.module_name}"}}[5m]))'
error_ratio_query = f'sum by (function, module) (rate (function_calls_count_total{{function="{self.function_name}",module="{self.module_name}", result="error"}}[5m])) / {request_rate_query}'
request_rate_query = f'sum by (function, module, commit, version) (rate (function_calls_count_total{{function="{self.function_name}",module="{self.module_name}"}}[5m]) {ADD_BUILD_INFO_LABELS})'
latency_query = f'sum by (le, function, module, commit, version) (rate(function_calls_duration_bucket{{function="{self.function_name}",module="{self.module_name}"}}[5m]) {ADD_BUILD_INFO_LABELS})'
error_ratio_query = f'sum by (function, module, commit, version) (rate (function_calls_count_total{{function="{self.function_name}",module="{self.module_name}", result="error"}}[5m]) {ADD_BUILD_INFO_LABELS}) / {request_rate_query}'

queries = {
"Request rate URL": request_rate_query,
Expand Down
8 changes: 5 additions & 3 deletions src/autometrics/test_prometheus_url.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,10 +25,12 @@ def test_create_urls_with_default_url(default_url_generator: Generator):
urls = default_url_generator.create_urls()

# print(urls.keys())
# print(urls.values())
brettimus marked this conversation as resolved.
Show resolved Hide resolved

result = {
"Request rate URL": "http://localhost:9090/graph?g0.expr=sum%20by%20%28function%2C%20module%29%20%28rate%20%28function_calls_count_total%7Bfunction%3D%22myFunction%22%2Cmodule%3D%22myModule%22%7D%5B5m%5D%29%29&g0.tab=0",
"Latency URL": "http://localhost:9090/graph?g0.expr=sum%20by%20%28le%2C%20function%2C%20module%29%20%28rate%28function_calls_duration_bucket%7Bfunction%3D%22myFunction%22%2Cmodule%3D%22myModule%22%7D%5B5m%5D%29%29&g0.tab=0",
"Error Ratio URL": "http://localhost:9090/graph?g0.expr=sum%20by%20%28function%2C%20module%29%20%28rate%20%28function_calls_count_total%7Bfunction%3D%22myFunction%22%2Cmodule%3D%22myModule%22%2C%20result%3D%22error%22%7D%5B5m%5D%29%29%20/%20sum%20by%20%28function%2C%20module%29%20%28rate%20%28function_calls_count_total%7Bfunction%3D%22myFunction%22%2Cmodule%3D%22myModule%22%7D%5B5m%5D%29%29&g0.tab=0",
"Request rate URL": "http://localhost:9090/graph?g0.expr=sum%20by%20%28function%2C%20module%2C%20commit%2C%20version%29%20%28rate%20%28function_calls_count_total%7Bfunction%3D%22myFunction%22%2Cmodule%3D%22myModule%22%7D%5B5m%5D%29%20%2A%20on%20%28instance%2C%20job%29%20group_left%28version%2C%20commit%29%20build_info%29&g0.tab=0",
"Latency URL": "http://localhost:9090/graph?g0.expr=sum%20by%20%28le%2C%20function%2C%20module%2C%20commit%2C%20version%29%20%28rate%28function_calls_duration_bucket%7Bfunction%3D%22myFunction%22%2Cmodule%3D%22myModule%22%7D%5B5m%5D%29%20%2A%20on%20%28instance%2C%20job%29%20group_left%28version%2C%20commit%29%20build_info%29&g0.tab=0",
"Error Ratio URL": "http://localhost:9090/graph?g0.expr=sum%20by%20%28function%2C%20module%2C%20commit%2C%20version%29%20%28rate%20%28function_calls_count_total%7Bfunction%3D%22myFunction%22%2Cmodule%3D%22myModule%22%2C%20result%3D%22error%22%7D%5B5m%5D%29%20%2A%20on%20%28instance%2C%20job%29%20group_left%28version%2C%20commit%29%20build_info%29%20/%20sum%20by%20%28function%2C%20module%2C%20commit%2C%20version%29%20%28rate%20%28function_calls_count_total%7Bfunction%3D%22myFunction%22%2Cmodule%3D%22myModule%22%7D%5B5m%5D%29%20%2A%20on%20%28instance%2C%20job%29%20group_left%28version%2C%20commit%29%20build_info%29&g0.tab=0",
}
assert result == urls

Expand Down
21 changes: 21 additions & 0 deletions src/autometrics/tracker/opentelemetry.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
Meter,
Counter,
Histogram,
UpDownCounter,
set_meter_provider,
)

Expand All @@ -21,6 +22,8 @@
COUNTER_NAME,
HISTOGRAM_DESCRIPTION,
HISTOGRAM_NAME,
BUILD_INFO_NAME,
BUILD_INFO_DESCRIPTION,
OBJECTIVE_NAME,
OBJECTIVE_PERCENTILE,
OBJECTIVE_LATENCY_THRESHOLD,
Expand All @@ -39,6 +42,7 @@ class OpenTelemetryTracker:

__counter_instance: Counter
__histogram_instance: Histogram
__up_down_counter_instance: UpDownCounter

def __init__(self):
exporter = PrometheusMetricReader("")
Expand All @@ -60,6 +64,11 @@ def __init__(self):
name=HISTOGRAM_NAME,
description=HISTOGRAM_DESCRIPTION,
)
self.__up_down_counter_instance = meter.create_up_down_counter(
name=BUILD_INFO_NAME,
description=BUILD_INFO_DESCRIPTION,
)
self._has_set_build_info = False

def __count(
self,
Expand Down Expand Up @@ -116,6 +125,18 @@ def __histogram(
},
)

def set_build_info(self, commit: str, version: str):
"""Observe the build info."""
if not self._has_set_build_info:
self._has_set_build_info = True
self.__up_down_counter_instance.add(
1.0,
attributes={
"commit": commit,
"version": version,
},
)

def finish(
self,
start_time: float,
Expand Down
18 changes: 17 additions & 1 deletion src/autometrics/tracker/prometheus.py
Original file line number Diff line number Diff line change
@@ -1,16 +1,20 @@
import time
from typing import Optional
from prometheus_client import Counter, Histogram
from prometheus_client import Counter, Histogram, Gauge
from .tracker import Result

from ..constants import (
COUNTER_NAME_PROMETHEUS,
HISTOGRAM_NAME_PROMETHEUS,
BUILD_INFO_NAME,
COUNTER_DESCRIPTION,
HISTOGRAM_DESCRIPTION,
BUILD_INFO_DESCRIPTION,
OBJECTIVE_NAME_PROMETHEUS,
OBJECTIVE_PERCENTILE_PROMETHEUS,
OBJECTIVE_LATENCY_THRESHOLD_PROMETHEUS,
COMMIT_KEY,
VERSION_KEY,
)
from ..objectives import Objective

Expand Down Expand Up @@ -41,6 +45,12 @@ class PrometheusTracker:
OBJECTIVE_LATENCY_THRESHOLD_PROMETHEUS,
],
)
prom_gauge = Gauge(
BUILD_INFO_NAME, BUILD_INFO_DESCRIPTION, [COMMIT_KEY, VERSION_KEY]
)

def __init__(self) -> None:
self._has_set_build_info = False

def _count(
self,
Expand Down Expand Up @@ -93,6 +103,12 @@ def _histogram(
threshold,
).observe(duration)

def set_build_info(self, commit: str, version: str):
"""Observe the build info. Should only be called once per tracker instance"""
if not self._has_set_build_info:
self._has_set_build_info = True
self.prom_gauge.labels(commit, version).set(1)

# def start(self, function: str = None, module: str = None):
# """Start tracking metrics for a function call."""
# pass
Expand Down
57 changes: 56 additions & 1 deletion src/autometrics/tracker/test_tracker.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,10 @@
from prometheus_client.exposition import generate_latest
import pytest

from .opentelemetry import OpenTelemetryTracker
from .prometheus import PrometheusTracker

from .tracker import default_tracker
from .tracker import default_tracker, create_tracker, set_tracker, TrackerType


def test_default_tracker(monkeypatch):
Expand All @@ -22,3 +25,55 @@ def test_default_tracker(monkeypatch):
monkeypatch.setenv("AUTOMETRICS_TRACKER", "something_else")
tracker = default_tracker()
assert isinstance(tracker, OpenTelemetryTracker)


def test_create_prometheus_tracker_set_build_info(monkeypatch):
"""Test that create_tracker (for a Prometheus tracker) calls set_build_info using env vars."""

commit = "d6abce3"
version = "1.0.1"

monkeypatch.setenv("AUTOMETRICS_COMMIT", commit)
monkeypatch.setenv("AUTOMETRICS_VERSION", version)

prom_tracker = create_tracker(TrackerType.PROMETHEUS)
assert isinstance(prom_tracker, PrometheusTracker)

blob = generate_latest()
assert blob is not None
data = blob.decode("utf-8")

prom_build_info = f"""build_info{{commit="{commit}",version="{version}"}} 1.0"""
assert prom_build_info in data

monkeypatch.delenv("AUTOMETRICS_VERSION", raising=False)
monkeypatch.delenv("AUTOMETRICS_COMMIT", raising=False)


def test_create_otel_tracker_set_build_info(monkeypatch):
"""
Test that create_tracker (for an OTEL tracker) calls set_build_info using env vars.
Note that the OTEL collector translates metrics to Prometheus.
"""
pytest.skip(
"Skipping test because OTEL collector does not create a gauge when it translates UpDownCounter to Prometheus"
)

commit = "a29a178"
version = "0.0.1"

monkeypatch.setenv("AUTOMETRICS_COMMIT", commit)
monkeypatch.setenv("AUTOMETRICS_VERSION", version)

otel_tracker = create_tracker(TrackerType.OPENTELEMETRY)
assert isinstance(otel_tracker, OpenTelemetryTracker)

blob = generate_latest()
assert blob is not None
data = blob.decode("utf-8")

prom_build_info = f"""build_info{{commit="{commit}",version="{version}"}} 1.0"""
assert prom_build_info in data

monkeypatch.delenv("AUTOMETRICS_VERSION", raising=False)
monkeypatch.delenv("AUTOMETRICS_COMMIT", raising=False)
14 changes: 12 additions & 2 deletions src/autometrics/tracker/tracker.py
Original file line number Diff line number Diff line change
Expand Up @@ -37,16 +37,26 @@ class TrackerType(Enum):

def create_tracker(tracker_type: TrackerType) -> TrackMetrics:
brettimus marked this conversation as resolved.
Show resolved Hide resolved
"""Create a tracker"""
tracker_instance = None
if tracker_type == TrackerType.OPENTELEMETRY:
# pylint: disable=import-outside-toplevel
from .opentelemetry import OpenTelemetryTracker

return OpenTelemetryTracker()
tracker_instance = OpenTelemetryTracker()
elif tracker_type == TrackerType.PROMETHEUS:
# pylint: disable=import-outside-toplevel
from .prometheus import PrometheusTracker

return PrometheusTracker()
tracker_instance = PrometheusTracker()

# NOTE - Only set the build info when the tracker is initialized
if tracker_instance:
brettimus marked this conversation as resolved.
Show resolved Hide resolved
tracker_instance.set_build_info(
brettimus marked this conversation as resolved.
Show resolved Hide resolved
commit=os.getenv("AUTOMETRICS_COMMIT") or "",
version=os.getenv("AUTOMETRICS_VERSION") or "",
)

return tracker_instance


def get_tracker_type() -> TrackerType:
Expand Down