Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Security Solution] Negative lookback is not handled properly during upgrade #204714

Closed
Tracked by #201502
xcrzx opened this issue Dec 18, 2024 · 11 comments · Fixed by #204317
Closed
Tracked by #201502

[Security Solution] Negative lookback is not handled properly during upgrade #204714

xcrzx opened this issue Dec 18, 2024 · 11 comments · Fixed by #204317
Assignees
Labels
8.18 candidate bug Fixes for quality problems that affect the customer experience Feature:Prebuilt Detection Rules Security Solution Prebuilt Detection Rules area impact:high Addressing this issue will have a high level of impact on the quality/strength of our product. Team:Detection Rule Management Security Detection Rule Management Team Team:Detections and Resp Security Detection Response Team Team: SecuritySolution Security Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc. v8.18.0

Comments

@xcrzx
Copy link
Contributor

xcrzx commented Dec 18, 2024

Related to: #203476

Summary

Rules are incorrectly marked as customized immediately after an upgrade due to improper handling of the rule interval field.

Steps to Reproduce

  1. Enable rule customization
  2. Install the prebuilt rules package 8.14.2
  3. Install the Multiple Okta Sessions Detected for a Single User rule from the package:
# POST /internal/detection_engine/prebuilt_rules/installation/_perform
{
    "mode": "SPECIFIC_RULES",
    "rules": [{
        "rule_id": "621e92b6-7e54-11ee-bdc0-f661ea17fbcd",
        "version": 1
    }]
}
  1. Upgrade the prebuilt rules package to 8.17.1
  2. Upgrade the previously installed rule:
# POST /internal/detection_engine/prebuilt_rules/upgrade/_perform
{
    "mode": "SPECIFIC_RULES",
    "rules": [{
        "rule_id": "621e92b6-7e54-11ee-bdc0-f661ea17fbcd",
        "revision": 0,
        "version": 206
    }]
}

Expected Result

The rule is upgraded successfully and is not marked as customized.

Actual Result

The rule is immediately marked as customized in the upgrade response:

{
    "summary": {
        "total": 1,
        "skipped": 0,
        "succeeded": 1,
        "failed": 0
    },
    "results": {
        "updated": [
            {
                "rule_source": {
                    "type": "external",
                    "is_customized": true
                },
// ... rest of the fields
            }
        ],
        "skipped": []
    },
    "errors": []
}

Initial Analysis

Preliminary analysis suggests that the rule interval field is not handled correctly during the upgrade. The negative interval value provided in the package (-1800s) is replaced with 0s, which likely breaks the rule execution schedule and results in the rule being marked as customized.

Image

@xcrzx xcrzx added 8.18 candidate bug Fixes for quality problems that affect the customer experience Feature:Prebuilt Detection Rules Security Solution Prebuilt Detection Rules area Team: SecuritySolution Security Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc. Team:Detection Rule Management Security Detection Rule Management Team Team:Detections and Resp Security Detection Response Team triage_needed labels Dec 18, 2024
@elasticmachine
Copy link
Contributor

Pinging @elastic/security-detection-rule-management (Team:Detection Rule Management)

@elasticmachine
Copy link
Contributor

Pinging @elastic/security-detections-response (Team:Detections and Resp)

@elasticmachine
Copy link
Contributor

Pinging @elastic/security-solution (Team: SecuritySolution)

@banderror banderror self-assigned this Dec 18, 2024
@maximpn
Copy link
Contributor

maximpn commented Dec 18, 2024

Conversion problem interval/from/to <-> interval/lookback

There is a difference between rule schedule value in a saved object and value represented to users

  • Saved object (and rule management API) has interval, from and to fields representing rule schedule. interval shows how often a rule runs in task runner. from and to stored in date math format like now-10m represent a date time range used to fetch source events. Task manager strives to run rules exactly every interval but it's not always possible due to multiple reasons like system load and various delays. To avoid any gaps to appear from point in time usually stands earlier than current time minus interval, for example interval is 10 minutes and from is now-12m meaning rule will analyze events starting from 12 minutes old. to represents the latest point in time source events will be analyzed.
  • Diffable rule and UI represent rule schedule as interval and lookback. Where interval is the same as above and lookback and a time duration before current time minus interval. For example interval is 10 minutes and lookback is 2 minutes it means a rule will analyzing events starting with 12 minutes old until the current moment in time.

Literally interval, from and to mean a rule runs every interval and analyzes events starting from from until to. Technically from and to may not have any correlation with interval, for example a rule may analyze one year old events. While it's reasonable for manual rule runs and gap remediation the same approach doesn't work well for usual rule schedule. Transformation between interval/from/to and interval/lookback works only when to is equal the current moment in time i.e. now.

Rule management APIs allow to set any from and to values resulting in inconsistent rule schedule. Transformed interval/lookback value won't represent real time interval used to fetch source events for analysis. On top of that negative lookback value may puzzle users on the meaning of the negative sign.

Possible cases including negative look-back and rule execution gaps

Considering rule's SO fields interval, from and to we can consider the following cases

  • Normal flow (ideal case) rule executes without delays, to - from > interval, to equals now. Additional look-back can be calculated. Transformation between rule SO's fields interval/from/to and diffable rule's interval/lookback is possible.

Image

  • Rule execution with delays, to - from > interval, to equals now. Additional look-back can be calculated. Transformation between rule SO's fields interval/from/to and diffable rule's interval/lookback is possible.

Image

  • Rule execution with gaps, to - from < interval, to equals now. Additional look-back can be calculated but it represents negative value. Transformation between rule SO's fields interval/from/to and diffable rule's interval/lookback is possible though requires handling negative durations.

Image

  • Rule execution with gaps from/to range intersects with interval, to is less than now. Additional look-back can't be calculated. Transformation between rule SO's fields interval/from/to and diffable rule's interval/lookback isn't possible.

Image

  • Rule execution with gaps from/to range does not intersect with interval, to is less than now. Additional look-back can't be calculated. Transformation between rule SO's fields interval/from/to and diffable rule's interval/lookback isn't possible.

Image

Looks at the above it's not hard to note that while the last 3 cases possible it inevitably leads to rule execution gaps appearing. Look-back interpretation is wrong when to isn't equal now.

Rule forms don't handle negative look-back

Our UI doesn't handle negative lookback values. It simply discards a negative sign and substitutes the rest for editing. For example for values interval = 60m, from = now-30m and to = now additional look-back will be to - from - interval = now - now-30m - 60m = -30m. It means that 30 minutes will be suggested for editing in rule editing form. Saving the form will result in changing from to now-90m

image

Possible solution

Since precise conversion between interval/from/to and interval/lookback isn't possible we should expose interval/from/to fields in diffable rule. We could convert to look-back at UI and falback to original values when it possible.

On the other hand it's important to understand there is no impact on gap remediation. And should UI support negative look-back. How should users interpret it?

#204317 partially mitigates the problem. It handles a case when to isn't equal now. But there is an existing case in prebuilt rules where to - from < interval, to equals now.


Hey @yctercero, could you shed some light on rule execution gap remediation functionality in the scope of negative look-back? Would it be detected as a gap?

FYI @approksiu

@nkhristinin
Copy link
Contributor

Thanks for the diagram and for raising this issue!

We don't handle negative lookback and gap remediation properly at that moment.

The main idea of gap remediation - it's to check if there is a gap between from and previousStartedAt. If there is a gap, we try to increase what rule is queried up to 4 intervals. The remaining time is reported as gap.

Positive lookback

Rule interval 10 minutes, lookback 1 minute
Previously started at 12:00. Was disabled after that and enabled at 13:00

from: 12:49
to: 13:00

So, we see the gap between 12:00 and 12:49 - we then try to remediate that up to 4 intervals (40 minutes), meaning that the rule will also query data from 12:09-12:49.

Data from 12:00-12:09 will be reported as a gap.

Negative lookback

Let's say the rule had 10 minutes interval and -9 minutes lookback time.

Let's say we have a previous run at 12:00 and then rule run at 12:10

from:12:09
to:12:10

So there is a gap (9 minutes) between the last run at 12:00 and 12:09, which we try to remediate.

Initially, I assumed that we would try to remediate up to 4 intervals, meaning we would query from 12:00-12:10, and in that case, we would neglect negative lookback.

But in reality, other things will happen (x-pack/plugins/security_solution/server/lib/detection_engine/rule_types/utils/utils.ts).

So first, we need to figure out how many catchup intervals we will need
getNumCatchupIntervals will return 1, which makes sense (as the gap is 9 minutes and the interval 10 minutes)

Then we try to get the exact intervals we need to query, and here is the problem.

getCatchupTuples - will calculate those intervals, using from and to, and subtract interval duration from that.

Meaning we need to catch up 1 time, and it will return from: 12:09-9m = 12:00 to: 12:10-9m = 12:01

So, it means that our rule will query data from [12:00-12:01, 12:09:12:10]

At that moment, you would think that we are reporting the gap as 12:01-12:09? No, we don't

Next, we try to calculate remainingGapMilliseconds, and we don't use previous tuples for that, but we use interval (10m) and several catchup tuples - 1, meaning 10m > 9m, and we think that we don't have any gap, and not report that.

Summary for negative lookback time :

  • With negative lookback time, we always will find a gap and try to remediate that
  • Because of a bug in remediation logic, we query what the user wants and a some of the user don't want.
  • And we don't report any gaps in the end. We only report gap if there difference between from and previous startedAt more than 4 intervals

Here is my question to a product at the moment:

  • Should we support the negative lookback time?
  • If we support negative lookback time, what is our strategy for gap detection and remediation? From my perspective negative lookback time is by design supposed to have gaps, which we probably shouldn't detect

@xcrzx
Copy link
Contributor Author

xcrzx commented Dec 19, 2024

@maximpn @nkhristinin

Let's keep things simple for Rule Customization Milestone 3:

  1. There are already rules with negative lookback values in the system, particularly in prebuilt rules. Additionally, we have never restricted this at the API level, so there might be custom rules as well. So we need to support negative lookbacks.

  2. Changing a rule's schedule on upgrade to a default value is not an option because:

    • It would cause rules to be marked as customized (as described in this ticket).
    • It would break the schedule intended by rule authors. For example, in the case of the My First Rule, which is designed to run every 24 hours and query only the past 30 minutes per run. Gaps in execution in this case are both intended and expected, no remediation is needed IMO.
  3. Since negative lookback values are expected, we need to support them in the UI. A validation warning for users, explaining that a negative lookback will result in execution gaps, seems like a reasonable trade-off. This approach minimizes the scope of changes required and will not adversely affect the Milestone 3 timeline. We can revisit this after the initial release if necessary.

@approksiu
Copy link

I agree with the points in the last comment by @xcrzx . We need to keep the flexibility by supporting negative lookback, and not make unexpected changes. We should keep the default behavior if the values are not set.
The only prebuilt rule atm with the intended negative lookback is "My first rule", since the only way to set negative lookback is with API/importing rule, point 3 above has low priority.

@maximpn
Copy link
Contributor

maximpn commented Dec 19, 2024

Me @Mikaayenson and DE team discussed negative look-back issue at DE tech time meeting. There are the following outcomes

  • TRADE team is gonna fix negative lookbacks in prebuilt rules. There is PR for that. Marshal also supported an idea that rules shouldn't have negative look-back since it's an obvious gap.
  • Marshall mentioned that some users set rule's field to to less than now values like now-5m to take into account ingestion delay.
  • Gap remediation feature doesn't properly detect gaps for rules with negative look-back.

@maximpn
Copy link
Contributor

maximpn commented Dec 19, 2024

@approksiu @xcrzx

Taking into account prebuilt rules won't have negative look-back in newer package versions and UI currently doesn't support negative look-back is there a reason to to support negative lookbacks?

The only way users may set negative look-back is Rule Management CRUD API.

@approksiu
Copy link

In this case since we do not intend to have OOTB rules with negative lookback, this use-case becomes a rarity.
I think it makes sense to show the rule negative lookback setting as is to the user (in case they modified the rule via API), but keep the current UI behavior allowing only positive lookback values.
Showing a UI warning for potential issues is a good idea.
@maximpn @xcrzx

In case there is additional user feedback we can adjust this later.

@banderror
Copy link
Contributor

Hey @maximpn @xcrzx @approksiu,

I think every point of @xcrzx from #204714 (comment) is important to take into account.

1‍⃣ The rule upgrade workflow doesn't and shouldn't have a goal to suggest "better" rule parameters to users, such as a schedule that would not be causing gaps. Gap remediation is out of the scope of this specific ticket and Milestone 3 in general.

2‍⃣ The rule upgrade workflow, both at the API level and in the UI, should not reset or change any values coming from neither current nor target rule versions, to any "better" or "default" values. If a prebuilt rule is valid and can be installed, the workflow should respect and retain its parameters. Same for custom rules - if it's possible to create a rule via API with to: "now-2m" and from: "now-12m", the workflow should respect and retain these parameters unchanged.

The fact that a few prebuilt rules were adjusted in elastic/detection-rules#4324 doesn't make a big difference. The case of "My First Rule" was pretty valid: we needed a rule that would be executed infrequently but not querying a large time interval. So this was by design, not sure why we changed it @approksiu.

3‍⃣ In my opinion, the above means that in the upgrade workflow we should allow not only negative lookbacks, but also to values that are not equal to now. In other words, all cases described in #204714 (comment) should be supported w/o resetting interval, from and to parameters to any default values.

Perhaps there's a relatively inexpensive way to do this:

  1. Make interval, to, and from part of DiffableRule - instead of interval and lookback - as Maxim proposed above:

    Since precise conversion between interval/from/to and interval/lookback isn't possible we should expose interval/from/to fields in diffable rule. We could convert to look-back at UI and falback to original values when it possible.

  2. If to == "now" and lookback can be parsed to a positive value, we show our existing regular UI with the interval and lookback form inputs. Otherwise, we show a fallback UI that allows to edit the interval, from and to fields directly.
  3. This would allow to keep the existing Creation and Editing pages unchanged at this stage.

maximpn added a commit to maximpn/kibana that referenced this issue Jan 21, 2025
…lastic#204317)

**Fixes: elastic#202715
**Fixes: elastic#204714

## Summary

This PR makes inconsistent/wrong rule's look-back duration prominent for
a user. It falls back to a default 1 minute value in rule upgrade
workflow.

## Details

### Negative/wrong `lookback` problem

There is a difference between rule schedule value in a saved object and
value represented to users

- Saved object (and rule management API) has `interval`, `from` and `to`
fields representing rule schedule. `interval` shows how often a rule
runs in task runner. `from` and `to` stored in date math format like
`now-10m` represent a date time range used to fetch source events. Task
manager strives to run rules exactly every `interval` but it's not
always possible due to multiple reasons like system load and various
delays. To avoid any gaps to appear `from` point in time usually stands
earlier than current time minus `interval`, for example `interval` is
`10 minutes` and `from` is `now-12m` meaning rule will analyze events
starting from 12 minutes old. `to` represents the latest point in time
source events will be analyzed.
- Diffable rule and UI represent rule schedule as `interval` and
`lookback`. Where `interval` is the same as above and `lookback` and a
time duration before current time minus `interval`. For example
`interval` is `10 minutes` and lookback is `2 minutes` it means a rule
will analyzing events starting with 12 minutes old until the current
moment in time.

Literally `interval`, `from` and `to` mean a rule runs every `interval`
and analyzes events starting from `from` until `to`. Technically `from`
and `to` may not have any correlation with `interval`, for example a
rule may analyze one year old events. While it's reasonable for manual
rule runs and gap remediation the same approach doesn't work well for
usual rule schedule. Transformation between `interval`/`from`/`to` and
`interval`/`lookback` works only when `to` is equal the current moment
in time i.e. `now`.

Rule management APIs allow to set any `from` and `to` values resulting
in inconsistent rule schedule. Transformed `interval`/`lookback` value
won't represent real time interval used to fetch source events for
analysis. On top of that negative `lookback` value may puzzle users on
the meaning of the negative sign.

### Prebuilt rules with `interval`/`from`/`to` resulting in negative
`lookback`

Some prebuilt rules have such `interval`, `from` and `to` field values
thatnegative `lookback` is expected, for example `Multiple Okta Sessions
Detected for a Single User`. It runs every `60 minutes` but has `from`
field set to `now-30m` and `to` equals `now`. In the end we have
`lookback` equals `to` - `from` - `interval` = `30 minutes` - `60
minutes` = `-30 minutes`.

Our UI doesn't handle negative `lookback` values. It simply discards a
negative sign and substitutes the rest for editing. In the case above
`30 minutes` will be suggested for editing. Saving the form will result
in changing `from` to `now-90m`

<img width="1712" alt="image"
src="/~https://github.com/user-attachments/assets/05519743-9562-4874-8a73-5596eeccacf2"
/>

### Changes in this PR

This PR mitigates rule schedule inconsistencies caused by `to` fields
not using the current point in time i.e. `now`. The following was done

- `DiffableRule`'s `rule_schedule` was changed to have `interval`,
`from` and `to` fields instead of `interval` and `lookback`
- `_perform` rule upgrade API endpoint was adapted to the new
`DIffableRule`'s `rule_schedule`
- Rule upgrade flyout calculates and shows `interval` and `lookback` in
Diff View, readonly view and field form when `lookback` is non-negative
and `to` equals `now`
- Rule upgrade flyout shows `interval`, `from` and `to` in Diff View,
readonly view and field form when `to` isn't equal `now` or calculated
`lookback` is negative
- Rule upgrade flyout shows a warning when `to` isn't equal `now` or
calculated `lookback` is negative
- Rule upgrade flyout's JSON Diff shows `interval` and `lookback` when
`lookback` is non-negative and `to` equals `now` and shows `interval`,
`from` and `to` in any other case
- Rule details page shows `interval`, `from` and `to` in Diff View,
readonly view and field form when `to` isn't equal `now` or calculated
`lookback` is negative
- `maxValue` was added to `ScheduleItemField` to have an ability to
restrict input at reasonable values

## Screenshots

- Rule upgrade workflow (negative look-back)

<img width="2558" alt="Screenshot 2025-01-02 at 13 16 59"
src="/~https://github.com/user-attachments/assets/b8bf727f-11ca-424f-892b-b024ba7f847a"
/>

<img width="2553" alt="Screenshot 2025-01-02 at 13 17 20"
src="/~https://github.com/user-attachments/assets/9f751ea4-0ce0-4a23-a3b7-0a16494d957e"
/>

<img width="2558" alt="Screenshot 2025-01-02 at 13 18 24"
src="/~https://github.com/user-attachments/assets/6908ab02-4011-4a6e-85ce-e60d5eac7993"
/>

- Rule upgrade workflow (positive look-back)

<img width="2555" alt="Screenshot 2025-01-02 at 13 19 12"
src="/~https://github.com/user-attachments/assets/06208210-c6cd-4842-8aef-6ade5d13bd36"
/>

<img width="2558" alt="Screenshot 2025-01-02 at 13 25 31"
src="/~https://github.com/user-attachments/assets/aed38bb0-ccfb-479a-bb3b-e5442c518e63"
/>

- JSON view

<img width="2559" alt="Screenshot 2025-01-02 at 13 31 37"
src="/~https://github.com/user-attachments/assets/07575a81-676f-418e-8b98-48eefe11ab00"
/>

- Rule details page

<img width="2555" alt="Screenshot 2025-01-02 at 13 13 16"
src="/~https://github.com/user-attachments/assets/e977b752-9d50-4049-917a-af2e8e3f0dfe"
/>

<img width="2558" alt="Screenshot 2025-01-02 at 13 14 10"
src="/~https://github.com/user-attachments/assets/06d6f477-5730-48ca-a240-b5e7592bf173"
/>

## How to test?

- Ensure the `prebuiltRulesCustomizationEnabled` feature flag is enabled
- Allow internal APIs via adding `server.restrictInternalApis: false` to
`kibana.dev.yaml`
- Clear Elasticsearch data
- Run Elasticsearch and Kibana locally (do not open Kibana in a web
browser)
- Install an outdated version of the `security_detection_engine` Fleet
package
```bash
curl -X POST --user elastic:changeme  -H 'Content-Type: application/json' -H 'kbn-xsrf: 123' -H "elastic-api-version: 2023-10-31" -d '{"force":true}' http://localhost:5601/kbn/api/fleet/epm/packages/security_detection_engine/8.14.1
```

- Install prebuilt rules
```bash
curl -X POST --user elastic:changeme  -H 'Content-Type: application/json' -H 'kbn-xsrf: 123' -H "elastic-api-version: 1" -d '{"mode":"ALL_RULES"}' http://localhost:5601/kbn/internal/detection_engine/prebuilt_rules/installation/_perform
```

- Set "inconsistent" rule schedule for `Suspicious File Creation via
Kworker` rule by running a query below
```bash
curl -X PATCH --user elastic:changeme -H "Content-Type: application/json" -H "elastic-api-version: 2023-10-31" -H "kbn-xsrf: 123" -d '{"rule_id":"ae343298-97bc-47bc-9ea2-5f2ad831c16e","interval":"10m","from":"now-5m","to":"now-2m"}' http://localhost:5601/kbn/api/detection_engine/rules
```

- Open rule upgrade flyout for `Suspicious File Creation via Kworker`
rule

---------

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
(cherry picked from commit 30bb71a)

# Conflicts:
#	.github/CODEOWNERS
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
8.18 candidate bug Fixes for quality problems that affect the customer experience Feature:Prebuilt Detection Rules Security Solution Prebuilt Detection Rules area impact:high Addressing this issue will have a high level of impact on the quality/strength of our product. Team:Detection Rule Management Security Detection Rule Management Team Team:Detections and Resp Security Detection Response Team Team: SecuritySolution Security Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc. v8.18.0
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants