Skip to content

Commit

Permalink
fix(checkrules): allow threshold annotations with&without dash0 prefix
Browse files Browse the repository at this point in the history
Dash0 renamed the threshold annotations as follows:
* threshold-degraded is now dash0-threshold-degraded
* threshold-critical is now dash0-threshold-critical

Since we validate these annotations in the operator, the validation
code needs to updated accordingly to allow the new annotation name. For
now, we allow both the legacy names (without the "dash0-" prefix) and
also the new names. At some point later, we can make this check stricter
and only allow the new names with the prefix.
  • Loading branch information
basti1302 committed Jan 14, 2025
1 parent b0f3f22 commit 449877f
Show file tree
Hide file tree
Showing 5 changed files with 119 additions and 52 deletions.
30 changes: 15 additions & 15 deletions helm-chart/dash0-operator/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -835,28 +835,28 @@ Prometheus rules will be mapped to Dash0 check rules as follows:
* Some annotations and labels are interpreted by Dash0, these are described in the conversion table below.
For example, to set the summary of the Dash0 check rule, add an annotation `summary` to the `rules` item in the
Prometheus rule resource.
* If `expr` contains the token `$__threshold`, and neither annotation `threshold-degraded` nor `threshold-critical` is
present, the rule will be considered invalid and will not be synchronized to Dash0.
* If `expr` contains the token `$__threshold`, and neither annotation `dash0-threshold-degraded` nor
`dash0-threshold-critical` is present, the rule will be considered invalid and will not be synchronized to Dash0.
* The group attribute `limit` is not supported by Dash0 and will be ignored.
* The group attribute `partial_response_strategy` is not supported by Dash0 and will be ignored.
* All labels (except for the ones explicitly mentioned in the conversion table below) will be listed under
"Additional labels".
* All annotations (except for the ones explicitly mentioned in the conversion table below) will be listed under
"Annotations".

| Prometheus alert rule attribute | Dash0 Check rule field | Notes |
|----------------------------------|--------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------|
| `alert` | Name of check rule, prefixed by the group name | must be a non-empty string |
| `expr` | "Expression" | must be a non-empty string |
| `interval` (from group) | "Evaluate every" | |
| `for` | "Grace periods"/"For" | default: "0s" |
| `keep_firing_for` | "Grace periods"/"Keep firing for" | default: "0s" |
| `annotations/summary` | "Summary" | |
| `annotations/description` | "Description" | |
| `annotations/threshold-degraded` | Will be used in place of the token `$__threshold` in the expression, to determine whether the check is _degraded_ | If present, needs to a be string that can be parsed to a float value according to the syntax described in https://go.dev/ref/spec#Floating-point_literals. |
| `annotations/threshold-critical` | Will be used in place of the token `$__threshold` in the expression, to determine whether the check is _critical_ | If present, needs to a be string that can be parsed to a float value according to the syntax described in https://go.dev/ref/spec#Floating-point_literals. |
| `annotations/*` | "Annotations" |
| `labels/*` | "Additional labels" |
| Prometheus alert rule attribute | Dash0 Check rule field | Notes |
|----------------------------------------|--------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------|
| `alert` | Name of check rule, prefixed by the group name | must be a non-empty string |
| `expr` | "Expression" | must be a non-empty string |
| `interval` (from group) | "Evaluate every" | |
| `for` | "Grace periods"/"For" | default: "0s" |
| `keep_firing_for` | "Grace periods"/"Keep firing for" | default: "0s" |
| `annotations/summary` | "Summary" | |
| `annotations/description` | "Description" | |
| `annotations/dash0-threshold-degraded` | Will be used in place of the token `$__threshold` in the expression, to determine whether the check is _degraded_ | If present, needs to a be string that can be parsed to a float value according to the syntax described in https://go.dev/ref/spec#Floating-point_literals. |
| `annotations/dash0-threshold-critical` | Will be used in place of the token `$__threshold` in the expression, to determine whether the check is _critical_ | If present, needs to a be string that can be parsed to a float value according to the syntax described in https://go.dev/ref/spec#Floating-point_literals. |
| `annotations/*` | "Annotations" |
| `labels/*` | "Additional labels" |

When a Prometheus ruls resource has been synchronized to Dash0, the operator will write a summary of that
synchronization operation to the status of the Dash0 monitoring resource in the same namespace. This summary will also
Expand Down
27 changes: 19 additions & 8 deletions internal/controller/prometheus_rules_controller.go
Original file line number Diff line number Diff line change
Expand Up @@ -70,8 +70,10 @@ type CheckRule struct {

const (
thresholdReference = "$__threshold"
thresholdDegradedAnnotation = "threshold-degraded"
thresholdCriticalAnnotation = "threshold-critical"
thresholdDegradedAnnotation = "dash0-threshold-degraded"
thresholdDegradedAnnotationLegacy = "threshold-degraded"
thresholdCriticalAnnotation = "dash0-threshold-critical"
thresholdCriticalAnnotationLegacy = "threshold-critical"
thresholdAnnotationsMissingMessagePattern = "the rule uses the token %s in its expression, but has neither the %s nor the %s annotation."
thresholdAnnotationsNonNumericalMessagePattern = "the rule uses the token %s in its expression, but its threshold-%s annotation is not numerical: %s."
)
Expand Down Expand Up @@ -675,10 +677,19 @@ func validateThreshold(
annotations map[string]string,
) []string {
hasThresholdInExpression := strings.Contains(expression, thresholdReference)
degradedThresholdValue, hasThresholdDegradedInAnnotation := annotations[thresholdDegradedAnnotation]
criticalThresholdValue, hasThresholdCriticalInAnnotation := annotations[thresholdCriticalAnnotation]
degradedThresholdValue, hasThresholdDegradedAnnotation := annotations[thresholdDegradedAnnotation]
if !hasThresholdDegradedAnnotation {
// In January 2025, the annotation names were changed to start with "dash0-", that is, threshold-degraded became
// dash0-threshold-degraded. For a grace period, we allow both names. At some point we can remove the check for
// the legacy name.
degradedThresholdValue, hasThresholdDegradedAnnotation = annotations[thresholdDegradedAnnotationLegacy]
}
criticalThresholdValue, hasThresholdCriticalAnnotation := annotations[thresholdCriticalAnnotation]
if !hasThresholdCriticalAnnotation {
criticalThresholdValue, hasThresholdCriticalAnnotation = annotations[thresholdCriticalAnnotationLegacy]
}

if hasThresholdInExpression && !hasThresholdDegradedInAnnotation && !hasThresholdCriticalInAnnotation {
if hasThresholdInExpression && !hasThresholdDegradedAnnotation && !hasThresholdCriticalAnnotation {
return append(validationIssues, fmt.Sprintf(
thresholdAnnotationsMissingMessagePattern,
thresholdReference,
Expand All @@ -687,11 +698,11 @@ func validateThreshold(
))
}

if !hasThresholdDegradedInAnnotation && !hasThresholdCriticalInAnnotation {
if !hasThresholdDegradedAnnotation && !hasThresholdCriticalAnnotation {
return validationIssues
}

if hasThresholdDegradedInAnnotation {
if hasThresholdDegradedAnnotation {
_, err := strconv.ParseFloat(degradedThresholdValue, 32)
if err != nil {
validationIssues = append(
Expand All @@ -705,7 +716,7 @@ func validateThreshold(
)
}
}
if hasThresholdCriticalInAnnotation {
if hasThresholdCriticalAnnotation {
_, err := strconv.ParseFloat(criticalThresholdValue, 32)
if err != nil {
validationIssues = append(
Expand Down
51 changes: 46 additions & 5 deletions internal/controller/prometheus_rules_controller_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -743,36 +743,57 @@ var _ = Describe("The Prometheus rule controller", Ordered, func() {
Expect(rule).To(BeNil())
}
}, []TableEntry{
// Note: running a focussed test in Idea, which uses --focus under the hood does not work when the
// test label contains a "$" character. Thus, we refer to $__threshold as threshold in the test labels.
// That is:
// go run github.com/onsi/ginkgo/v2/ginkgo -v "--focus=foo bar baz $__threshold whatever"
// will run no tests at all.
Entry(
"expression with $__threshold, no annotations -> invalid",
"expression with threshold, no annotations -> invalid",
thresholdValidationTestConfig{
annotations: nil,
expectedValidationIssues: []string{thresholdAnnotationsMissingMessage()},
}),
Entry(
"expression with $__threshold, no threshold annotation -> invalid",
"expression with threshold, no threshold annotation -> invalid",
thresholdValidationTestConfig{
annotations: map[string]string{"unrelated": "annotation"},
expectedValidationIssues: []string{thresholdAnnotationsMissingMessage()},
}),
Entry(
"expression with $__threshold, degraded annotation -> valid",
"expression with threshold, degraded annotation -> valid",
thresholdValidationTestConfig{
annotations: map[string]string{
thresholdDegradedAnnotation: "10",
},
expectedValidationIssues: nil,
}),
Entry(
"expression with $__threshold, criticial annotation -> valid",
"expression with threshold, legacy degraded annotation -> valid",
thresholdValidationTestConfig{
annotations: map[string]string{
thresholdDegradedAnnotationLegacy: "10",
},
expectedValidationIssues: nil,
}),
Entry(
"expression with threshold, criticial annotation -> valid",
thresholdValidationTestConfig{
annotations: map[string]string{
thresholdCriticalAnnotation: "10",
},
expectedValidationIssues: nil,
}),
Entry(
"expression with $__threshold, both annotations -> valid",
"expression with threshold, legacy criticial annotation -> valid",
thresholdValidationTestConfig{
annotations: map[string]string{
thresholdCriticalAnnotationLegacy: "10",
},
expectedValidationIssues: nil,
}),
Entry(
"expression with threshold, both annotations -> valid",
thresholdValidationTestConfig{
annotations: map[string]string{
"unrelated": "annotation",
Expand All @@ -781,6 +802,26 @@ var _ = Describe("The Prometheus rule controller", Ordered, func() {
},
expectedValidationIssues: nil,
}),
Entry(
"expression with threshold, both legacy annotations -> valid",
thresholdValidationTestConfig{
annotations: map[string]string{
"unrelated": "annotation",
thresholdDegradedAnnotationLegacy: "10",
thresholdCriticalAnnotationLegacy: "5",
},
expectedValidationIssues: nil,
}),
Entry(
"expression with threshold, mixed current and legacy annotations -> valid",
thresholdValidationTestConfig{
annotations: map[string]string{
"unrelated": "annotation",
thresholdDegradedAnnotation: "10",
thresholdCriticalAnnotationLegacy: "5",
},
expectedValidationIssues: nil,
}),
Entry(
"degraded annotation is not numerical -> invalid",
thresholdValidationTestConfig{
Expand Down
55 changes: 35 additions & 20 deletions internal/controller/third_party_crd_common.go
Original file line number Diff line number Diff line change
Expand Up @@ -554,22 +554,36 @@ func synchronizeViaApi(
synchronizationErrors = make(map[string]string)
}
// In theory, the following map merge could overwrite synchronization errors from the MapResourceToHttpRequests
// stage with errors occurring in executeAllHttpRequests, but items that have an error in MapResourceToHttpRequests
// are never converted to requests, so the two maps are disjoint.
// stage with errors occurring in executeAllHttpRequests, but items that have an error in
// MapResourceToHttpRequests are never converted to requests, so the two maps are disjoint.
maps.Copy(synchronizationErrors, httpErrors)
}
logger.Info(
fmt.Sprintf("%s %s %s/%s: %d %s(s), %d successfully synchronized, validation issues: %v, synchronization errors: %v",
actionLabel,
resourceReconciler.KindDisplayName(),
thirdPartyResource.GetNamespace(),
thirdPartyResource.GetName(),
itemsTotal,
resourceReconciler.ShortName(),
len(successfullySynchronized),
validationIssues,
synchronizationErrors,
))
if len(validationIssues) == 0 && len(synchronizationErrors) == 0 {
logger.Info(
fmt.Sprintf("%s %s %s/%s: %d %s(s), %d successfully synchronized",
actionLabel,
resourceReconciler.KindDisplayName(),
thirdPartyResource.GetNamespace(),
thirdPartyResource.GetName(),
itemsTotal,
resourceReconciler.ShortName(),
len(successfullySynchronized),
))
} else {
logger.Error(
fmt.Errorf("validation issues and/or synchronization issues occurred"),
fmt.Sprintf("%s %s %s/%s: %d %s(s), %d successfully synchronized, validation issues: %v, synchronization errors: %v",
actionLabel,
resourceReconciler.KindDisplayName(),
thirdPartyResource.GetNamespace(),
thirdPartyResource.GetName(),
itemsTotal,
resourceReconciler.ShortName(),
len(successfullySynchronized),
validationIssues,
synchronizationErrors,
))
}
writeSynchronizationResult(
ctx,
resourceReconciler,
Expand Down Expand Up @@ -894,11 +908,11 @@ func writeSynchronizationResult(
Name: monitoringResource.GetName(),
},
monitoringResource); err != nil {
logger.Error(
err,
logger.Info(
fmt.Sprintf("failed attempt (might be retried) to fetch the Dash0 monitoring resource %s/%s "+
"before updating it with the synchronization results for %s \"%s\": items total %d, "+
"successfully synchronized: %v, validation issues: %v, synchronization errors: %v",
"successfully synchronized: %v, validation issues: %v, synchronization errors: %v; "+
"fetch error: %v",
monitoringResource.GetNamespace(),
monitoringResource.GetName(),
resourceReconciler.ShortName(),
Expand All @@ -907,6 +921,7 @@ func writeSynchronizationResult(
succesfullySynchronized,
validationIssuesPerItem,
synchronizationErrorsPerItem,
err,
))
return err
}
Expand All @@ -920,15 +935,15 @@ func writeSynchronizationResult(
validationIssuesPerItem,
)
if err := resourceReconciler.K8sClient().Status().Update(ctx, monitoringResource); err != nil {
logger.Error(
err,
logger.Info(
fmt.Sprintf("failed attempt (might be retried) to update the Dash0 monitoring resource "+
"%s/%s with the synchronization results for %s \"%s\": %v",
"%s/%s with the synchronization results for %s \"%s\": %v; update error: %v",
monitoringResource.GetNamespace(),
monitoringResource.GetName(),
resourceReconciler.ShortName(),
qualifiedName,
resultForThisResource,
err,
))
return err
}
Expand Down
Loading

0 comments on commit 449877f

Please sign in to comment.