-
Notifications
You must be signed in to change notification settings - Fork 181
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AWS Security Lake Parquet File Schema Format Issues upon AWS Opensearch Ingestion & AWS Athena Querying #728
Comments
Thanks for this report, I'll work on it asap. |
Note that I was the one who mentioned that I thought it was an issue converting from proto to parquet. Upon going through the parquet library used to generate the files by this repo, it looks like REPEATED is a valid keyword in parquet. The issue is that the use of REPEATED is not correct. See https://github.com/apache/parquet-format/blob/master/LogicalTypes.md for detailed description of how REPEATED should be used. I see an issue in these places: If types is repeated then OCSFFIndingDetails needs to be in a list or a map. It is not If tags is repeated then OCSFFIndingDetails needs to be in a list or a map. Is is not See this tip in the parquet-go library. |
Issues go stale after 90d of inactivity. Mark the issue as fresh with Stale issues rot after an additional 30d of inactivity and eventually close. If this issue is safe to close now please do so with Provide feedback via https://github.com/falcosecurity/community. /lifecycle stale |
/remove-lifecycle stale |
Issues go stale after 90d of inactivity. Mark the issue as fresh with Stale issues rot after an additional 30d of inactivity and eventually close. If this issue is safe to close now please do so with Provide feedback via https://github.com/falcosecurity/community. /lifecycle stale |
Stale issues rot after 30d of inactivity. Mark the issue as fresh with Rotten issues close after an additional 30d of inactivity. If this issue is safe to close now please do so with Provide feedback via https://github.com/falcosecurity/community. /lifecycle rotten |
Background:
We are leveraging AWS security lake to ingest various log sources into OCSF, have this data be queryable via AWS Athena, as well as ingest this data into AWS OpenSearch. We are attempting to ingest Falco data by following by the following article: falcosidekick integration documentation.
Describe the bug:
After following the instructions provided in the article linked above we are receiving Falco data in our security lake s3 bucket and this data is queryable via S3 Select. However, the lake formation table generated by security lake returns a generic error of
Unable to Read Parquet File
when attempting to query via Athena. Additionally, we are leveraging the AWS OpenSearch Ingestion Pipeline with the Security Lake S3 parquet OCSF pipeline template. Native sources from security lake are ingested without error but we are seeing an error when Falco data is ingested. The error from OS ingestion pipeline (via CloudWatch) is as follows:AWS support was contacted regarding this error. The following was their response:
How to reproduce it:
Expected behaviour:
Environment:
Falco version
0.36.1 (x86_64) - from docker.io/falcosecurity/falco-no-driver:0.36.1
System info
Cloud provider or hardware configuration
AWS EKS - managed nodegroups
OS
Kernel:
Linux falco-6sck4 5.10.197-186.748.amzn2.x86_64 #1 SMP Tue Oct 10 00:30:07 UTC 2023 x86_64 GNU/Linux
Installation method:
Kubernetes
Additional context:
N/A
The text was updated successfully, but these errors were encountered: