Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Force the output of key-value pairs for dicts, lists and tuples in the JSON adapter #119

Open
Zawadidone opened this issue Apr 17, 2024 · 7 comments

Comments

@Zawadidone
Copy link
Contributor

Zawadidone commented Apr 17, 2024

This issue is related to fox-it/dissect.target#681.

The field type digest is a dict in Python which is an object in JSON, as shown below.

target-query -t <TARGET> -f mft --hash --limit 1 | rdump -J | jq
[...]
  "path_resolved": "c:
$MFT",
  "path_digest": {
    "md5": "5dd5bd6f342c2bceb93dc67783f3991a",
    "sha1": "de9713a272acb20fa9b34296e33e3a409675a3c7",
    "sha256": "be58856974ed849e5edcc4752d716209b80fe1914a20d8528316f8732a33697c"
  }
}

Depending on which search platform is used, e.g. Elasticsearch huntandhackett/ir-automation@a005f5d/logstash-dissect.conf#L54-L62, it is a hassle to store records in a structured data format without introducing blind spots. Note that we use the JSON adapter in combination with Logstash to save records in Elasticsearch, not the Elasticsearch adapter.

Because of that I want to force the output of Python dicts (dictlist), lists (typedlist) and tuples to use key-value pairs when using the JSON adapter.

target-query -t <TARGET> -f mft --hash -~~limit 1 | rdump -J -~~<SPECIFIC-FLAG> | jq
[...]
  "path_resolved": "c:
$MFT",
  "path_digest_md5": "5dd5bd6f342c2bceb93dc67783f3991a",
  "path_digest_sha1": "de9713a272acb20fa9b34296e33e3a409675a3c7",
  "path_digest_sha256": "be58856974ed849e5edcc4752d716209b80fe1914a20d8528316f8732a33697c",
}

Suggested implementation:

| Python | JSON |
|
|-|-|-|
|--------|--------|
|
| dict: path_digest = {"md5": [...]} | {"path_digest_md5":[...]} |
|
| tuple: fieldname = (1,2) | {"fieldname_0": 1, "fieldname_1": 2} |
|
| list: fieldname = (1,2) | {"fieldname_0": 1, "fieldname_1": 2} |



I don't know how this would apply for the field type command or other field types that I haven't mentioned.|

@Zawadidone
Copy link
Contributor Author

@yunzheng do you have a suggestion on how this issue can be resolved?

@JSCU-CNI
Copy link
Contributor

If you don't mind, I have some suggestions for your elastic setup @Zawadidone. You could probably fix this without changing flow.record. Instead, have you tried using a different processor?

You could use an ingest node pipeline to edit every document before it is ingested by elasticsearch. Kibana has a nice UI for this as well. You could also use the Logstash json filter plugin or the Filebeat decode_json_fields plugin.

We are thinking about open sourcing our elastic index mapping for dissect records. Is that something you would be interested in?

@Zawadidone
Copy link
Contributor Author

Zawadidone commented Apr 22, 2024

@JSCU-CNI thanks for the suggestion.

I am aware of the solutions that Logstash and Elasticsearch provide. But to solve this issue, I am looking for a solution whereby the Logstash configuration and/or the Elasticsearch configuration don't need to be adjusted after every Dissect Target update, that adds new records and field types.

Yes I am very interested in the Elastic index mapping for Dissect records. We currently use our own Logstash configuration to ingest records into Elasticsearch, whereby our own fork of Timesketch is used to perform analysis.

We have explored the usage of a Dissect Elasticsearch index template or the use of an dynamic index template that can use the same fields for different data types, but we haven't yet made a decision on that.

@yunzheng
Copy link
Member

@yunzheng do you have a suggestion on how this issue can be resolved?

I think the way you suggest is one of the better ones, which is on the field type level which makes it predictable and testable.

Another "easier" way would just be to flatten the JSON dictionary in the JsonfileWriter adapter, something like https://pypi.org/project/flatten-json/ . I would probably then just drop outputting the record descriptors as they are not in sync anymore.

More difficult would be to do this generically on the record level itself (so all adapters could benefit from a --flatten flag), however, for every flattened field that results in a new field would mean you need to update the RecordDescriptor. Which could be a performance issue.

@Zawadidone
Copy link
Contributor Author

@yunzheng thanks for the suggestion, I will start working on a solution that flattens the JSON objects.

@JSCU-CNI we currently use the following index template, which fails if a record with the field example uses the data type object, after which a second record with the field example uses the data type text. Because an Elasticsearch field can't use the data type object and text at the same time.

@Zawadidone
Copy link
Contributor Author

Relates to fox-it/dissect.target#723

@pyrco
Copy link
Contributor

pyrco commented Jun 27, 2024

@Zawadidone that would be a nice option to have for JSON output! Make sure though to make it configurable, as multiple adapters (currently splunk, jsonfile and elastic) use JsonRecordPacker, and not everybody expects the json to be flattened.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants