Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JSON Streaming #412

Merged
merged 25 commits into from
Feb 15, 2024
Merged

JSON Streaming #412

merged 25 commits into from
Feb 15, 2024

Conversation

rvazarkar
Copy link
Contributor

@rvazarkar rvazarkar commented Feb 12, 2024

Description

Implements JSON streaming in the ingest pipeline

Motivation and Context

Large JSON files were completely loaded into memory causing exhaustion issues. AzureHound dumps can easily be multiple GB, so this wasn't a sustainable way of reading data. With this change, we'll load all of the data by streaming json in memory instead, which massively lower memory consumption.

How Has This Been Tested?

Testing using 200+mb json files locally

Screenshots (if appropriate):

Types of changes

  • Chore (a change that does not modify the application functionality)
  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • Documentation updates are needed, and have been made accordingly.
  • I have added and/or updated tests to cover my changes.
  • All new and existing tests passed.
  • My changes include a database migration.

cmd/api/src/daemons/datapipe/fileops.go Outdated Show resolved Hide resolved
superlinkx and others added 2 commits February 13, 2024 14:52
This can be reverted or modified further, it's only a quickly slapped together example for testing the parser
@superlinkx superlinkx dismissed their stale review February 14, 2024 18:59

Resolved, but I haven't had a chance to fully review the logical changes and would like to re-review

Copy link
Contributor

@zinic zinic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few nits, mostly grouping and some optimization.

cmd/api/src/daemons/datapipe/decoders.go Show resolved Hide resolved
cmd/api/src/daemons/datapipe/decoders.go Show resolved Hide resolved
cmd/api/src/daemons/datapipe/decoders.go Outdated Show resolved Hide resolved
cmd/api/src/daemons/datapipe/decoders.go Outdated Show resolved Hide resolved
cmd/api/src/daemons/datapipe/decoders.go Outdated Show resolved Hide resolved
cmd/api/src/daemons/datapipe/models.go Outdated Show resolved Hide resolved
cmd/api/src/daemons/datapipe/models.go Outdated Show resolved Hide resolved
cmd/api/src/daemons/datapipe/models.go Outdated Show resolved Hide resolved
cmd/api/src/daemons/datapipe/models.go Outdated Show resolved Hide resolved
return nil
}

func decodeGroupData(batch graph.Batch, reader io.ReadSeeker) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would tossing in the ingest function in decodeBasicData eliminate the need for these 3 special copy pasta ❄️ s?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They take different types

@rvazarkar rvazarkar requested a review from zinic February 15, 2024 15:10
Copy link
Contributor

@zinic zinic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Latest revision looks great. Pull!

@rvazarkar rvazarkar merged commit a64f03a into main Feb 15, 2024
3 checks passed
@rvazarkar rvazarkar deleted the BED-4114 branch February 15, 2024 20:47
@github-actions github-actions bot locked and limited conversation to collaborators Feb 15, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants