Force the output of key-value pairs for dicts, lists and tuples in the JSON adapter #119

Zawadidone · 2024-04-17T15:36:37Z

This issue is related to fox-it/dissect.target#681.

The field type digest is a dict in Python which is an object in JSON, as shown below.

target-query -t <TARGET> -f mft --hash --limit 1 | rdump -J | jq
[...]
  "path_resolved": "c:
$MFT",
  "path_digest": {
    "md5": "5dd5bd6f342c2bceb93dc67783f3991a",
    "sha1": "de9713a272acb20fa9b34296e33e3a409675a3c7",
    "sha256": "be58856974ed849e5edcc4752d716209b80fe1914a20d8528316f8732a33697c"
  }
}

Depending on which search platform is used, e.g. Elasticsearch huntandhackett/ir-automation@a005f5d/logstash-dissect.conf#L54-L62, it is a hassle to store records in a structured data format without introducing blind spots. Note that we use the JSON adapter in combination with Logstash to save records in Elasticsearch, not the Elasticsearch adapter.

Because of that I want to force the output of Python dicts (dictlist), lists (typedlist) and tuples to use key-value pairs when using the JSON adapter.

target-query -t <TARGET> -f mft --hash -~~limit 1 | rdump -J -~~<SPECIFIC-FLAG> | jq
[...]
  "path_resolved": "c:
$MFT",
  "path_digest_md5": "5dd5bd6f342c2bceb93dc67783f3991a",
  "path_digest_sha1": "de9713a272acb20fa9b34296e33e3a409675a3c7",
  "path_digest_sha256": "be58856974ed849e5edcc4752d716209b80fe1914a20d8528316f8732a33697c",
}

Suggested implementation:

| Python | JSON |
|
|-|-|-|
|--------|--------|
|
| dict: path_digest = {"md5": [...]} | {"path_digest_md5":[...]} |
|
| tuple: fieldname = (1,2) | {"fieldname_0": 1, "fieldname_1": 2} |
|
| list: fieldname = (1,2) | {"fieldname_0": 1, "fieldname_1": 2} |

I don't know how this would apply for the field type command or other field types that I haven't mentioned.|

The text was updated successfully, but these errors were encountered:

Zawadidone · 2024-04-22T13:22:16Z

@yunzheng do you have a suggestion on how this issue can be resolved?

JSCU-CNI · 2024-04-22T14:09:18Z

If you don't mind, I have some suggestions for your elastic setup @Zawadidone. You could probably fix this without changing flow.record. Instead, have you tried using a different processor?

You could use an ingest node pipeline to edit every document before it is ingested by elasticsearch. Kibana has a nice UI for this as well. You could also use the Logstash json filter plugin or the Filebeat decode_json_fields plugin.

We are thinking about open sourcing our elastic index mapping for dissect records. Is that something you would be interested in?

Zawadidone · 2024-04-22T16:22:27Z

@JSCU-CNI thanks for the suggestion.

I am aware of the solutions that Logstash and Elasticsearch provide. But to solve this issue, I am looking for a solution whereby the Logstash configuration and/or the Elasticsearch configuration don't need to be adjusted after every Dissect Target update, that adds new records and field types.

Yes I am very interested in the Elastic index mapping for Dissect records. We currently use our own Logstash configuration to ingest records into Elasticsearch, whereby our own fork of Timesketch is used to perform analysis.

We have explored the usage of a Dissect Elasticsearch index template or the use of an dynamic index template that can use the same fields for different data types, but we haven't yet made a decision on that.

yunzheng · 2024-04-25T09:08:11Z

@yunzheng do you have a suggestion on how this issue can be resolved?

I think the way you suggest is one of the better ones, which is on the field type level which makes it predictable and testable.

Another "easier" way would just be to flatten the JSON dictionary in the JsonfileWriter adapter, something like https://pypi.org/project/flatten-json/ . I would probably then just drop outputting the record descriptors as they are not in sync anymore.

More difficult would be to do this generically on the record level itself (so all adapters could benefit from a --flatten flag), however, for every flattened field that results in a new field would mean you need to update the RecordDescriptor. Which could be a performance issue.

Zawadidone · 2024-04-26T16:28:48Z

@yunzheng thanks for the suggestion, I will start working on a solution that flattens the JSON objects.

@JSCU-CNI we currently use the following index template, which fails if a record with the field example uses the data type object, after which a second record with the field example uses the data type text. Because an Elasticsearch field can't use the data type object and text at the same time.

Zawadidone · 2024-06-20T11:43:48Z

Relates to fox-it/dissect.target#723

pyrco · 2024-06-27T13:38:06Z

@Zawadidone that would be a nice option to have for JSON output! Make sure though to make it configurable, as multiple adapters (currently splunk, jsonfile and elastic) use JsonRecordPacker, and not everybody expects the json to be flattened.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Force the output of key-value pairs for dicts, lists and tuples in the JSON adapter #119

Force the output of key-value pairs for dicts, lists and tuples in the JSON adapter #119

Zawadidone commented Apr 17, 2024 •

edited by DissectBot

Loading

Zawadidone commented Apr 22, 2024

JSCU-CNI commented Apr 22, 2024

Zawadidone commented Apr 22, 2024 •

edited

Loading

yunzheng commented Apr 25, 2024

Zawadidone commented Apr 26, 2024

Zawadidone commented Jun 20, 2024

pyrco commented Jun 27, 2024

Force the output of key-value pairs for dicts, lists and tuples in the JSON adapter #119

Force the output of key-value pairs for dicts, lists and tuples in the JSON adapter #119

Comments

Zawadidone commented Apr 17, 2024 • edited by DissectBot Loading

Zawadidone commented Apr 22, 2024

JSCU-CNI commented Apr 22, 2024

Zawadidone commented Apr 22, 2024 • edited Loading

yunzheng commented Apr 25, 2024

Zawadidone commented Apr 26, 2024

Zawadidone commented Jun 20, 2024

pyrco commented Jun 27, 2024

Zawadidone commented Apr 17, 2024 •

edited by DissectBot

Loading

Zawadidone commented Apr 22, 2024 •

edited

Loading