-
Notifications
You must be signed in to change notification settings - Fork 127
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JSON Streaming #412
JSON Streaming #412
Conversation
tests: add tests for validatemeta tag tests: fix a bad package declaration
This can be reverted or modified further, it's only a quickly slapped together example for testing the parser
Resolved, but I haven't had a chance to fully review the logical changes and would like to re-review
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few nits, mostly grouping and some optimization.
cmd/api/src/test/integration/harnesses/adinboundcontrolharness.svg
Outdated
Show resolved
Hide resolved
return nil | ||
} | ||
|
||
func decodeGroupData(batch graph.Batch, reader io.ReadSeeker) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would tossing in the ingest function in decodeBasicData eliminate the need for these 3 special copy pasta ❄️ s?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They take different types
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Latest revision looks great. Pull!
Description
Implements JSON streaming in the ingest pipeline
Motivation and Context
Large JSON files were completely loaded into memory causing exhaustion issues. AzureHound dumps can easily be multiple GB, so this wasn't a sustainable way of reading data. With this change, we'll load all of the data by streaming json in memory instead, which massively lower memory consumption.
How Has This Been Tested?
Testing using 200+mb json files locally
Screenshots (if appropriate):
Types of changes
Checklist: