Description
As part of #2816, a cache processor was added to enrich FDR events with host and user metadata at ingest-time.
That means that right now, the cached metadata is stored locally in the agent and the enrichment doesn't work when agents are scaled horizontally.
Crowdstrike delivers the FDR events containing only an opaque host ID. We cannot directly associate the event with a named host and its metadata like OS or IP. To do that we must enrich the event with that metadata ourselves through a lookup.
The ingest-time host metadata enrichment that exists today was designed to work in single Elastic Agent deployments. We should evaluate making it work when Agent is scaled horizontally.
Query-time enrichment with ES|QL and an enrich table is possible, but there are trade-offs.
Ideas
-
Support storing data in memcached or redis (something that is available as a service on CSPs). Make the existing cache processor "multi-layer" with read-through to the distributed cache when the local memory cache doesn't contain the key.
-
ES|QL is adding a new
lookup join
feature that could be used to perform the metadata join at query time. That would simplify the architecture as it doesn't require any changes on the agent side. See [Discuss] Supporting ES|QL LOOKUP JOIN on integration data package-spec#873.