Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Datajson v3.1 #12319

Open
wants to merge 26 commits into
base: master
Choose a base branch
from
Open

Datajson v3.1 #12319

wants to merge 26 commits into from

Conversation

regit
Copy link
Contributor

@regit regit commented Dec 22, 2024

Update of #12289.

This is a major functional upgrade and it also has code organization update as most datajson code is now in its own files to limit the size of file like datasets.c that was becoming really complicated to manage.

On the functional side, there is a new feature that changes the scope of the original proposal.

There is a lot of tools were the IOCs are published in a format like the following:

{
  "info": {
    "threat": [
      {
        "context": "gold old test",
        "year": 2005,
        "host": {
          "fqdn": "www.testmyids.com",
          "domain": "testmyids.com"
        }
      },
      {
        "context": "not old test",
        "year": 2015,
        "host": {
          "fqdn": "www.testmyids.org",
          "domain": "testmyids.org"
        }
      },
      {
        "context": "old test",
        "year": 2023,
        "host": {
          "domain": "testmyids.com"
        }
      }
    ]
  }
}

There is a object containing an array and in this array, we have one object per IOC and some of the key contains the IOC we want to inject in the Suricata.

The feature update consists in giving the capabilities to directly handle this type of data in Suricata. To do so, 2 options have been added:

  • json_key to known which key contains the value to inject in dataset
  • array_key to indicate where the array is

In the previous example, we can use it with the following match:

http.host; datajson:isset,nk,type string,load hosts-nested-key.json,key nk, json_key host.fqdn, array_key info.threat;

A signature with this match will then produce something like:

 "alert": {
     "extra": {
         "nk":  {
            "context": "gold old test",
            "year": 2005,
            "host": {
              "fqdn": "www.testmyids.com",
               "domain": "testmyids.com"
            }
        }
   }
}

The previous data format has been kept and this new one is thus an additional choice to the user. The advantage is that it does not require post processing to inject the data in Suricata which should be convenient for most users.

Contribution style:

Our Contribution agreements:

Changes (if applicable):

Link to ticket: https://redmine.openinfosecfoundation.org/issues/

Describe changes:

  • add additional data format
  • document new data format
  • move code to dedicated files
  • rebase the code

SV_BRANCH=OISF/suricata-verify#2205

@suricata-qa
Copy link

ERROR:

ERROR: QA failed on build_asan.

Pipeline 24045

@suricata-qa
Copy link

ERROR:

ERROR: QA failed on build_asan.

Pipeline 24046

@jasonish
Copy link
Member

I modified the PR description to fix the SV branch and restarted the checks.

Copy link

codecov bot commented Dec 22, 2024

Codecov Report

Attention: Patch coverage is 54.94206% with 661 lines in your changes missing coverage. Please review.

Project coverage is 83.03%. Comparing base (6f937c7) to head (fc19da9).

Additional details and impacted files
@@            Coverage Diff             @@
##           master   #12319      +/-   ##
==========================================
- Coverage   83.26%   83.03%   -0.23%     
==========================================
  Files         912      921       +9     
  Lines      257643   259075    +1432     
==========================================
+ Hits       214521   215127     +606     
- Misses      43122    43948     +826     
Flag Coverage Δ
fuzzcorpus 60.60% <2.99%> (-0.54%) ⬇️
livemode 19.23% <1.90%> (-0.17%) ⬇️
pcap 44.02% <2.04%> (-0.40%) ⬇️
suricata-verify 62.78% <54.53%> (-0.08%) ⬇️
unittests 58.87% <2.31%> (-0.32%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

@regit
Copy link
Contributor Author

regit commented Dec 23, 2024

I modified the PR description to fix the SV branch and restarted the checks.

Sorry I missed that again.

@inashivb
Copy link
Member

@regit there are some build errors in the CI checks as well. Have you looked at them?

@regit
Copy link
Contributor Author

regit commented Dec 23, 2024

@regit there are some build errors in the CI checks as well. Have you looked at them?

Indeed going to fix that

@regit regit marked this pull request as draft December 23, 2024 08:38
@regit
Copy link
Contributor Author

regit commented Dec 23, 2024

Setting to draft as it needs some more love but I would be happy to get feedback on the concept.

@suricata-qa
Copy link

ERROR:

ERROR: QA failed on build_asan.

Pipeline 24048

@github-advanced-security
Copy link

This pull request sets up GitHub code scanning for this repository. Once the scans have completed and the checks have passed, the analysis results for this pull request branch will appear on this overview. Once you merge this pull request, the 'Security' tab will show more code scanning analysis results (for example, for the default branch). Depending on your configuration and choice of analysis tool, future pull requests will be annotated with code scanning analysis results. For more information about GitHub code scanning, check out the documentation.

@suricata-qa
Copy link

ERROR:

ERROR: QA failed on build_asan.

Pipeline 24049

regit added 10 commits December 24, 2024 01:02
This patch introduces a new keyword datajson that is similar
to dataset with a twist. Where dataset allows match from sets,
datajson allows the same but also adds JSON data to the alert
event. This data is comint from the set definition it self.
For example, an ipv4 set will look like:

  10.16.1.11,{"test": "success","context":3}

The syntax is value and json data separated by a comma.

The syntax of the keyword is the following:

  datajson:isset,src_ip,type ip,load src.lst,key src_ip;

Compare to dataset, it just have a supplementary option key
that is used to indicate in which subobject the JSON value
should be added.

The information is added in the even under the alert.extra
subobject:

  "alert": {
    "extra": {
      "src_ip": {
        "test": "success",
        "context": 3
      },

The main interest of the feature is to be able to contextualize
a match. For example, if you have an IOC source, you can do

 value1,{"actor":"APT28","Country":"FR"}
 value2,{"actor":"APT32","Country":"NL"}

This way, a single dataset is able to produce context to the
event where it was not possible before and multiple signatures
had to be used.

Ticket: OISF#7372
Previous code was using an array and introducing a limit in the
number of datajson keywords that can be used in a signature.

This patch uses a linked list instead to overcome the limit. By
using a first element of the list that is part of the structure
we limit the cost of the feature to a structure member added to
PacketAlert structure. Only the PacketAlertFree function is
impacted as we need to iterate to find potential allocation.

Ticket: OISF#7372
It was not handling correctly the json values with space as they
were seen as multiple arguments.

Ticket: OISF#7372
As 1.2.23.4,1 can be a valid datarep and a valid datajson entry,
we can't differentiate datarep and datajson contents by the value
so we need to separate the parsing based on the knowing the keyword.
regit added 10 commits December 24, 2024 01:03
With datajson infrastructure in place, it is now possible to
add data in the extra information section. Following an idea
by Jason Ish, this patch adds the feature for pcre extraction.

A PCRE such as pcre:"/(?P<alert_ua>[a-zA-Z]+)\//" will add the
content of the captured group to alert.extra.ua.
It can contain any vars so need addition properties.
As previous commit is adding the alert option, let's document
the full family.
The format introduced in datajson is an evolution of the
historical datarep format. This has some limitations. For example,
if a user fetch IOCs from a threat intel server there is a large
change that the format will be JSON or XML. Suricata has no support
for the second but can support the first one.

This patch implements this concept. A optional json_key option is
added to the datajson keyword. If present, then Suricata assumes
that the data file contains a JSON object that is an array.
For each element elt of this array, the value added to the dataset
is elt['$json_key'] and the JSON data is elt itself.

Keeping the key value may seem redundant but it is useful to have it
directly accessible in the extra data to be able to query it
independantly of the signature (where it can be multiple metadata
or even be a transformed metadata).
In some case, when interacting with data (mostly coming from
threat intel servers), the JSON array containing the data
to use is not at the root of the object and it is ncessary
to access a subobject.

This patch implements this with support of key in level1.level2.
This is done via the `array_key` option that contains the path
to the data.
This patch separates the datajson from dataset. File like
datasets.c file were really too long after adding the datajson
feature.

Doing that, some function names implementing datajson feature have
been renamed with a Datajson prefix to simplify them.
@suricata-qa
Copy link

ERROR:

ERROR: QA failed on build_asan.

Pipeline 24050

With this patch, it is now possible to define the
value to be used in the datajson set as a value
in a chain of subobjects.

For example, with the following JSON:

{
  "info": {
    "threat": [
      {
        "context": "gold old test",
        "year": 2005,
        "host": {
          "fqdn": "www.testmyids.com",
          "domain": "testmyids.com"
        }
      }
    ]
  }
}

it is possible to match on host.fqdn by doing:

http.host; datajson:isset,nkbadhost,type string,load hosts-nested-key.json,key host,json_key host.fqdn, array_key info.threat

`array_key info.threat` to access the inner array and then
`json_key host.fqdn` to access the field inside.
@suricata-qa
Copy link

ERROR:

ERROR: QA failed on build_asan.

Pipeline 24051

@suricata-qa
Copy link

Information:

ERROR: QA failed on SURI_TLPW2_autofp_suri_time.

ERROR: QA failed on SURI_TLPR1_suri_time.

field baseline test %
SURI_TLPW2_autofp_stats_chk
.uptime 139 145 104.32%
SURI_TLPR1_stats_chk
.uptime 623 643 103.21%

Pipeline 24052

This patch handles all clang-tidy warnings but the cognitive
complexity ones that will be handled in next patch.
@suricata-qa
Copy link

Information:

ERROR: QA failed on SURI_TLPW2_autofp_suri_time.

ERROR: QA failed on SURI_TLPR1_suri_time.

field baseline test %
SURI_TLPW2_autofp_stats_chk
.uptime 139 144 103.6%
SURI_TLPR1_stats_chk
.uptime 623 645 103.53%

Pipeline 24054

@regit regit marked this pull request as ready for review December 28, 2024 09:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants