Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Package v8.16.2 contains new rule versions without updates #4276

Open
Tracked by #201502
banderror opened this issue Dec 2, 2024 · 13 comments
Open
Tracked by #201502

[Bug] Package v8.16.2 contains new rule versions without updates #4276

banderror opened this issue Dec 2, 2024 · 13 comments
Assignees
Labels
bug Something isn't working Team: TRADE

Comments

@banderror
Copy link

banderror commented Dec 2, 2024

Related to: elastic/kibana#201825 (review)

Summary

In Kibana, I upgraded the package with prebuilt rules from v8.16.2-beta.1 to v8.16.2 and got 64 detection rules that can be upgraded via the Security Solution UI. Out of those 64 rules, there were 26 rules that had only the version field bumped from something like 2 or 3 to something like 100+, and there were no other changes to rule fields in these newer versions of rules.

Here's an example of how such rules are displayed in the rule upgrade UI:

Image
Image

To Reproduce

On way to reproduce this when testing it with the local Kibana: elastic/kibana#201825 (review)

Expected Behavior

In the package with prebuilt rules, we shouldn't release a new rule version if it doesn't have any changes to any other rule parameters, compared to the previous version of the rule.

Problematic rules

Here's the full list of rules with updated version and no updates to any other fields:

  • First Occurrence of Personal Access Token (PAT) Use For a GitHub User
  • High Number of Cloned GitHub Repos From PAT
  • First Occurrence of IP Address For GitHub Personal Access Token (PAT)
  • GitHub App Deleted
  • GitHub PAT Access Revoked
  • Multiple Okta User Authentication Events with Client Address
  • Multiple Okta User Authentication Events with Same Device Token Hash
  • GitHub Owner Role Granted To User
  • Okta User Sessions Started from Different Geolocations
  • First Occurrence of IP Address For GitHub User
  • GitHub User Blocked From Organization
  • First Occurrence of User-Agent For a GitHub User
  • Multiple Device Token Hashes for Single Okta Session
  • First Occurrence GitHub Event for a Personal Access Token (PAT)
  • First Occurrence of GitHub User Interaction with Private Repo
  • First Occurrence of GitHub Repo Interaction From a New IP
  • GitHub Protected Branch Settings Changed
  • Member Removed From GitHub Organization
  • First Occurrence of User Agent For a GitHub Personal Access Token (PAT)
  • New GitHub App Installed
  • First Occurrence of Private Repo Event from Specific GitHub Personal Access Token (PAT)
  • High Number of Okta Device Token Cookies Generated for Authentication
  • New GitHub Owner Added
  • New User Added To GitHub Organization
  • GitHub Repo Created
  • GitHub Repository Deleted
@shashank-elastic
Copy link
Contributor

These are min_stack version updates 🙂 Part of the PR
#4273

With the changing requirements of the integration versions, we identify the kibana version to minstack for rules. Where no query contents are changed in this case.

Will be presenting this into todays Simplified Protections Management Theme Working Group meeting

@shashank-elastic
Copy link
Contributor

Notable Debugging and Root Cause Identification

cc @Mikaayenson @approksiu @banderror

@shashank-elastic
Copy link
Contributor

Spot Check on My Test Stack

  • Works as Expected Rules --> Meaning the Version Diff shows related integration Field, Kibana Asset has the related integration filed populated
Rule Name Kibana Asset with related integrations
First Occurrence of Personal Access Token (PAT) Use For a GitHub User Asset
High Number of Cloned GitHub Repos From PAT Asset
First Occurrence of IP Address For GitHub Personal Access Token (PAT) Asset
GitHub App Deleted Asset
GitHub PAT Access Revoked Asset
GitHub Owner Role Granted To User Asset
First Occurrence of IP Address For GitHub User Asset
GitHub User Blocked From Organization Asset
First Occurrence of User-Agent For a GitHub User Asset
First Occurrence GitHub Event for a Personal Access Token (PAT) Asset
First Occurrence of GitHub User Interaction with Private Repo Asset
First Occurrence of GitHub Repo Interaction From a New IP Asset
GitHub Protected Branch Settings Changed Asset
Member Removed From GitHub Organization Asset
First Occurrence of User Agent For a GitHub Personal Access Token (PAT) Asset
New GitHub App Installed Asset
First Occurrence of Private Repo Event from Specific GitHub Personal Access Token (PAT) Asset
New GitHub Owner Added Asset
New User Added To GitHub Organization Asset
GitHub Repo Created Asset
  • Bug --> As described above we dont have the related integration field populated.
Rule Name Kibana Asset without related integrations
Multiple Okta User Authentication Events with Client Address Asset
Multiple Okta User Authentication Events with Same Device Token Hash Asset
Okta User Sessions Started from Different Geolocations Asset
Multiple Device Token Hashes for Single Okta Session Asset
High Number of Okta Device Token Cookies Generated for Authentication Asset
GitHub Repository Deleted Asset

Also out of the 26 rules reported by @banderror , I was able to reproduce only 6 of them :), the rest of them seem to render the right set of information for me
cc @Mikaayenson

@approksiu
Copy link

I see the same behavior as @banderror. @shashank-elastic I will share the instance where you can check. I see both issues - missing integration, and no changes in any rule fields. The rendering depends on the rule version you currently have, since you skipped some updates - there are more changes accumulated that you can view in diff.

@Mikaayenson
Copy link
Contributor

@banderror @approksiu There are two issues.

  1. Some rules do not have event.dataset when they should like GitHub Repository Deleted. This can be easily fixed.
  2. Some rules are ES|QL. Since we do not have a parser for this language the build time fields are not populated. This includes related_integrations and required_fields.

@approksiu
Copy link

@Mikaayenson thanks! Are there plans to fix ES|QL rules build time field issues?

@shashank-elastic
Copy link
Contributor

@Mikaayenson Yes thats true I was able to drill dow to the source of the First Version of the File that was added in release https://github.com/elastic/ia-trade-team/issues/388

The very first version of the rule, https://github.com/elastic/integrations/pull/10239/files#diff-0208dd40dfe6b85cb7f8bdba6bcd7305d84f9303bbf86514c8a0f9624157a8de does not have the related integrations_field.

To confirm the same here is another ESQL rule sample AWS STS Role Chaining that does not populate related integrations in Kibana

Image
Image

@shashank-elastic
Copy link
Contributor

Have created a PR to fix even.dataset -#4278
cc @Mikaayenson @approksiu

@shashank-elastic
Copy link
Contributor

shashank-elastic commented Dec 3, 2024

@approksiu From your test stack I was able to reproduce the issue on the UI.

But from the asset perspective all the necessary required fields for related_integrations are rightly populated in non ES|QL supported rules.

Looking at the custom package that we created to test smart limits the only major difference we see is the json format

https://github.com/elastic/integrations/pull/11866/files#diff-49be033ff70d5dd333c4de4affe46c089228d1a26ec28fb57b4d8e0473f27724.

But all of the required information is rightly populated.

If we move to the latest version 8.16.2-beta.2 the file format is rectified https://github.com/elastic/integrations/pull/11900/files#diff-c0d42a1b673f8da2e46f126338934d5466bd7c9eb1f740f02870e4d3b2c5f238

But its unclear weather this is causing the issue, and since am not upgrading from a smart limit package, I may not be seeing this issue as in your stack or @banderror stack

From @approksiu Test Stack

Image

From @shashank-elastic Test Stack

Image

Also please not am moving from version 1--> 103 and the other stack is moving from 2 --> 103. One possible way to look at this is we are trying to diff from v8.16.2-beta.1 which was a custom package only to test smart limits, released ad-hoc.

So a conclusive debugging would be

  • ES|QL Rule types don't have related_integrations filed populated, which makes the update diff only show version bumps and not actual changes
  • Rule without event.dataset is fixed in PR Add event dataset for missing rule in Github integration #4278
  • For other EQL rule types having empty diff based on the version bump movement we are seeing, it is likely that the previous version of the rule was from a custom built package to test only smart limits and we should be ideally looking at 8.16.1 --> 8.16.2 rule update diffs ( This is going by the description of the bug that we see this when we upgrade v8.16.2-beta.1 to v8.16.2)

Having said that, v8.16.2-beta.1 was temporary only a beta package to test smart limits and we have override that with the latest GA package so customers seeing this would be highly unlikely!

cc @Mikaayenson @banderror

@Mikaayenson
Copy link
Contributor

Mikaayenson commented Dec 4, 2024

@Mikaayenson thanks! Are there plans to fix ES|QL rules build time field issues?

@approksiu We do not have plans to fix ES|QL build time fields. Going back to Mar 2023 we were recommended by the ESQL team to build a python parser in-house. We do not have the capacity to do this, and any ad-hoc approach will result in bugs/tech debt.

We still do need this parser if we want to ship the build time fields (related integrations, required fields, etc) in a scalable and reliable way.

One alternative would be to parse ESQL rules in kibana since it's already available and extract the build time fields. In a previous POC I explored in detail the LOE to develop and maintain a parser and it quickly became evident that we didn't have to the capacity to develop/maintain this. We need help / resources for this gap.

One stop gap for the related integrations field would be:

  • Adding the related integrations content directly to the toml file. This is not scalable, very much error prone, and would require DEs to know the specific version to add, and will not be automatically updated when min stacks occur. Even though we can do this out of the box with no code changes, I do not recommend this for the reasons just mentioned. Image
  • With a line or two of code changes, we could attempt to repurpose the integrations field within the toml file. This information is duplicative since the event dataset may also be in the query. This approach would dynamically calculate the version. However it's abusing the available integration field and extended the definition of how it should be used (which potentially creates opaque issues and tech debt), making it harder to maintain long term. Image

Important

For any chances to our detection engineering workflows, we ( @elastic/threat-research-and-detection-engineering ) will need to discuss internally to minimize impact. Also, minor note, these two ideas will still not address the required fields issue.

@banderror
Copy link
Author

banderror commented Dec 11, 2024

Thanks for looking into it @Mikaayenson @shashank-elastic @approksiu! A few comments on the above:

Also please not am moving from version 1--> 103 and the other stack is moving from 2 --> 103. One possible way to look at this is we are trying to diff from v8.16.2-beta.1 which was a custom package only to test smart limits, released ad-hoc.

I think it would be more correct to say that we diff rule versions between each other, not packages. Let's break this example down.

If a new 2 version was added in the prerelease v8.16.2-beta.1 package, my expectation was that versions 2 and 103 would also appear in the subsequent v8.16.2 and following release packages. This wouldn't affect any real users because no user would be able to install version 2 from v8.16.2-beta.1 => no user would be able to upgrade from 2 to 103.

However, the fact that these versions 2 and 103 were added to the package by the release process, and the diff between them is empty (except the version field itself), looks suspicious as if there is/are bugs in the release process.

If we abstract away from prerelease and release package, I think there would be a few rules of thumb:

  • In Kibana, for simplicity we accept that any user can be upgrading a rule from any X version to any Y version, where X < Y.
  • In Kibana, we don't control what X and Y can be.
  • No X to Y upgrade should yield an empty diff, where Y is the next version after X. In other words, any bump of the version field should be accompanied by some changes to other rule fields (parameters).

For other EQL rule types having empty diff based on the version bump movement we are seeing, it is likely that the previous version of the rule was from a custom built package to test only smart limits

This also sounds a bit suspicious. Are you saying that we added some fake rule versions to the v8.16.2-beta.1 package? My expectation was that v8.16.2-beta.1 would be equal to v8.16.2 if we wanted to include historical versions to v8.16.2. Which means that the issue is not with the v8.16.2-beta.1 package itself, but with the package build and release process that potentially contains some bugs.

ES|QL Rule types don't have related_integrations filed populated, which makes the update diff only show version bumps and not actual changes
We do not have plans to fix ES|QL build time fields
One alternative would be to parse ESQL rules in kibana...

The concern here is not about having or not having related_integrations and required_fields specified for prebuilt ES|QL rules. We're concerned about bumping their versions when their fields are not getting any updates in the detection-rules repo.

I think the latter should be fixed on the detection-rules repo side. As far as I understand, when we bump an integration's version, we update the corresponding prebuilt rules that indirectly depend on this integration via rule query, and increment their versions. So, if we can't parse ES|QL queries yet, we shouldn't be incrementing ES|QL rules versions as part of this process.

Rule without event.dataset is fixed in PR #4278

Great, thanks!


I'm going to re-test the upgrade workflow with the new v8.17.1 package and some older packages. Will post here when I have an update. Thanks!

@shashank-elastic
Copy link
Contributor

shashank-elastic commented Dec 11, 2024

So this sounds a bit suspicious. Are you saying that we added some fake rule versions to the v8.16.2-beta.1 package? My expectation was that v8.16.2-beta.1 would be equal to v8.16.2 if we wanted to include historical versions to v8.16.2. Which means that the issue is not with the v8.16.2-beta.1 package itself, but with the package build and release process that potentially contains some bugs.

@banderror we never introduced any fake rule versions, we followed the beta release package process like we do for our rule set and add historical versions of the rule, like mentioned in #4150 (comment).
We did not run any new lock versions, but built from the latest rule versions a custom package v8.16.2-beta.1 to facilitate testing at that moment. I quickly wanted to clarify that there were no fake versions manually introduced. But I will also wait for the test results with package v8.17.1 which was released yesterday!

@banderror
Copy link
Author

@shashank-elastic Thank you, got it. I also just updated my comment above because I noticed a few mistakes from my side. Corrected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Team: TRADE
Projects
None yet
Development

No branches or pull requests

5 participants