fix: lower memory footprint #543

outerlook · 2024-09-10T00:52:38Z

Description

I actually forgot to push this commit on #534

How Has This Been Tested?

Updated the benchmark setup to use tree structure for schemas and enhanced the benchmark case handling. Adjusted setup functions, created a `SetupSchemasInput`, modified results handling, and added more descriptive comments.

Extended the state machine timeout from 30 to 120 minutes to accommodate longer-running benchmarks. Added a comment in the runSingleTest function to clarify the query of the index-0 stream as the root stream.

Introduce a check in the benchmark setup to ensure that the tree's maximum depth does not exceed the PostgreSQL limitations, preventing potential errors. Added a new constant, maxDepth, set at 179, based on empirical findings.

Extended the state machine timeout from 2 hours to 6 hours to accommodate longer processing times. Adjusted task-specific timeouts and added a new polling interval to optimize the frequency of status checks during prolonged operations.

Changed the "timeoutSeconds" parameter to "executionTimeout" for better clarity. Also corrected the naming convention of the "TimeoutSeconds" constant to align with the updated AWS guideline.

Added default setting for LOG_RESULTS to true in TestBench to ensure results are logged unless specified otherwise. Modified benchmark.go to conditionally print results based on the LOG_RESULTS environment variable. Updated step_functions.go to explicitly set LOG_RESULTS to false when executing benchmark from the deployed environment.

Implement test cases to validate index change and YoY index calculations. Includes initialization, data insertion, and result conversion to ensure accuracy and coverage of edge cases.

Updated the benchmark workflow to introduce a `formatErrorState` pass state for better error formatting and handling. Replaced `Fail` state with a chain of `Pass` and `Fail` to ensure structured error information is passed upstream. Adjusted error catching and chaining to integrate with the new error handling structure.

Micro instances were causing errors and hangs during tests, hence they have been commented out from the list of tested EC2 instance types. Medium and large instance types have been added to ensure thorough benchmarking.

# Conflicts: # deployments/infra/stacks/benchmark/step_functions.go

Modified insertRecordsForPrimitive function to use bulk insert for faster database operations. The records are now batched into a single SQL insert statement, significantly improving performance by reducing the number of individual insert operations.

Implemented comprehensive unit tests for the NewTree function covering various scenarios such as different quantities of streams and branching factors. The tests also include checks for tree structure, node properties, and special cases for larger trees.

Added parameters `child_data_providers` and `child_stream_ids` to `get_raw_record` and `get_raw_index`. Updated the logic to buffer and emit ordered results, ensuring proper handling of data arrays' lengths and emitting results by date and taxonomy index sequentially.

Assigned default value 0 to the buffer length to prevent null errors during buffer length evaluation. This ensures the buffer dates are processed correctly and avoids unexpected termination due to null length assignment.

# Conflicts: # internal/benchmark/benchmark.go

…dures - Add checks for empty child taxonomies to prevent null upper bound errors - Improve buffer handling and array initialization to avoid potential issues - Refactor loop structure for better efficiency and correctness - Update comments and improve code readability

Refactored the logic for handling child data providers and stream IDs by removing unnecessary buffering and looping. This results in cleaner code that directly returns data values and indices in a simplified manner, ensuring proper ordering by date and taxonomy index.

Simplify the loop logic for processing taxonomies and emitting values by removing unnecessary steps and optimizing array handling. Introduce a new approach to handling array element removal and managing date-based value emission efficiently. This reduces code complexity and enhances maintainability.

The function now merges CSV files and saves both a markdown and a CSV file back to the results bucket. New code handles reading and uploading the merged CSV file to S3, ensuring both formats are available.

Added `github.com/pkg/errors` to wrap errors throughout the codebase, providing more context and improving the debugging process. This includes error wrapping in file operations, schema creation, metadata insertion, and benchmark runs.

Upgraded the kwil-db, kwil-db/core, and kwil-db/parse modules to their latest revisions in the go.mod and go.sum files. This ensures we are using the most current features and fixes provided by these libraries.

Introduce comprehensive test cases within `complex_composed_test.go` to validate various scenarios including record retrieval, index checks, latest value checks, out-of-range data handling, and error scenarios. Deploy necessary contracts and initialize datasets for testing.

The new ToDisplay method provides a string representation of the tree, showing parent-child relationships. A corresponding test function, TestDisplayTree, has been added to verify the output for various branching factors.

Simplified the shapePairs for better clarity and added new test cases to evaluate the cost of adding streams and depth. Reduced the number of samples from 10 to 3, and adjusted the days array to exclude 3 days. Commented out tests that caused errors or had call stack size issues.

Created a CloudWatch log group to capture logs for EC2 benchmark tasks. Updated IAM role to include CloudWatch managed policy for logging.

Added a loop to attempt running the benchmark up to three times before giving up, with a 10-second interval between retries. This change ensures more robust handling of transient failures during benchmark execution. Also, removed redundant command concatenations for better readability.

Ensure that the root node is correctly marked as a leaf when there is only one stream. This change returns the initialized tree immediately if the condition is met, optimizing the tree setup process.

Split function tests into groups of 10 to avoid exhausting Postgres during execution. Introduced a helper function `chunk` to divide tests, ensuring better test reliability and stability.

Simplified the benchmark step by removing the retry logic in the script. The benchmark will now run just once without reattempting on failure.

Introduced a results channel for collecting benchmark results and improved test robustness with retry logic. Added logging to track benchmark execution and integrated a cleanup function to handle interruptions gracefully.

Introduce a README file for the `internal/contracts` directory, detailing the purpose and contents of the Kuneiform contracts used in the Truflation Stream Network (TSN). This includes descriptions of each contract file, synchronization practices, and links to additional resources.

Updated timeout handling to use a centralized constant in the benchmark state machine. This improves maintainability by defining `TotalTimeout` in a new constants file and referencing it across the code. Consequently, it ensures consistency and eases future modifications.

This update parallelizes the schema parsing process using goroutines to improve efficiency and adds a bulk insertion for metadata. These changes enhance the performance and overall speed of the setup operation.

Commented out the test cases involving 800 streams as they cause memory starvation in t3.small instances. These tests significantly impact memory usage because they store the entire tree in memory.

Replaced the buffered result channel with an unbuffered one to manage schema parsing and initialization concurrently.

coderabbitai · 2024-09-10T00:52:52Z

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai generate interesting stats about this repository and render them as a table.
- @coderabbitai show all the console.log statements in this repository.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

outerlook · 2024-09-10T13:48:09Z

already added to #550

outerlook added 30 commits August 30, 2024 15:33

Refactor: Improve schema setup and benchmark case handling

f976f31

Updated the benchmark setup to use tree structure for schemas and enhanced the benchmark case handling. Adjusted setup functions, created a `SetupSchemasInput`, modified results handling, and added more descriptive comments.

Increase state machine timeout and add comment in benchmark

7ee6b2a

Extended the state machine timeout from 30 to 120 minutes to accommodate longer-running benchmarks. Added a comment in the runSingleTest function to clarify the query of the index-0 stream as the root stream.

Add depth check to prevent benchmarks from exceeding limits

f455cbe

Introduce a check in the benchmark setup to ensure that the tree's maximum depth does not exceed the PostgreSQL limitations, preventing potential errors. Added a new constant, maxDepth, set at 179, based on empirical findings.

Increase timeouts and adjust polling intervals

8f3945e

Extended the state machine timeout from 2 hours to 6 hours to accommodate longer processing times. Adjusted task-specific timeouts and added a new polling interval to optimize the frequency of status checks during prolonged operations.

Update execution timeout parameter in Step Functions

c804b27

Changed the "timeoutSeconds" parameter to "executionTimeout" for better clarity. Also corrected the naming convention of the "TimeoutSeconds" constant to align with the updated AWS guideline.

Merge branch 'fix/bench-timeout' into chore/idx-change-opt

2ff01af

Optimize get_index_change to remove nested loop

f05db59

Add index change tests for contract validation

9c4705e

Implement test cases to validate index change and YoY index calculations. Includes initialization, data insertion, and result conversion to ensure accuracy and coverage of edge cases.

Remove unsupported micro instances from benchmark types

17ab7ea

Micro instances were causing errors and hangs during tests, hence they have been commented out from the list of tested EC2 instance types. Medium and large instance types have been added to ensure thorough benchmarking.

Merge branch 'main' into chore/idx-change-opt

07fbeb3

# Conflicts: # deployments/infra/stacks/benchmark/step_functions.go

Optimize record insertion process

1ccd939

Modified insertRecordsForPrimitive function to use bulk insert for faster database operations. The records are now batched into a single SQL insert statement, significantly improving performance by reducing the number of individual insert operations.

fix null without types at streams template

8dbb1f0

Merge branch 'chore/idx-change-opt' into fix/bench-err

b21b1f4

Merge branch 'main' into fix/bench-err

2601f13

Assign default value to avoid null error in buffer handling

0be4679

Assigned default value 0 to the buffer length to prevent null errors during buffer length evaluation. This ensures the buffer dates are processed correctly and avoids unexpected termination due to null length assignment.

Merge branch 'main' into fix/unsuport-micro

9774350

# Conflicts: # internal/benchmark/benchmark.go

Add os package import to benchmark.go

a179725

Merge branch 'fix/unsuport-micro' into fix/bench-err

bd29a5f

Add CSV export alongside markdown in Export Results Lambda

a99cc4a

The function now merges CSV files and saves both a markdown and a CSV file back to the results bucket. New code handles reading and uploading the merged CSV file to S3, ensuring both formats are available.

Wrap errors with more context for better debugging

85c0bb7

Added `github.com/pkg/errors` to wrap errors throughout the codebase, providing more context and improving the debugging process. This includes error wrapping in file operations, schema creation, metadata insertion, and benchmark runs.

Update kwil-db dependencies to latest versions

8bb3ed3

Upgraded the kwil-db, kwil-db/core, and kwil-db/parse modules to their latest revisions in the go.mod and go.sum files. This ensures we are using the most current features and fixes provided by these libraries.

Merge branch 'refs/heads/main' into fix/bench-err

409bf35

outerlook added 13 commits September 5, 2024 12:49

Add ToDisplay method to Tree and a test for visualization

00f7438

The new ToDisplay method provides a string representation of the tree, showing parent-child relationships. A corresponding test function, TestDisplayTree, has been added to verify the output for various branching factors.

Add CloudWatch log group for SSM command execution

f0588f4

Created a CloudWatch log group to capture logs for EC2 benchmark tasks. Updated IAM role to include CloudWatch managed policy for logging.

Fix tree initialization for single stream scenario

c25128b

Ensure that the root node is correctly marked as a leaf when there is only one stream. This change returns the initialized tree immediately if the condition is met, optimizing the tree setup process.

Chunk long-running tests to prevent Postgres timeouts

7e63aa7

Split function tests into groups of 10 to avoid exhausting Postgres during execution. Introduced a helper function `chunk` to divide tests, ensuring better test reliability and stability.

Remove retry logic from benchmark script

ffb277c

Simplified the benchmark step by removing the retry logic in the script. The benchmark will now run just once without reattempting on failure.

Refactor benchmark functions and add results handling.

4bec6ec

Introduced a results channel for collecting benchmark results and improved test robustness with retry logic. Added logging to track benchmark execution and integrated a cleanup function to handle interruptions gracefully.

Parallelize schema parsing and batch metadata insertion

908d12c

This update parallelizes the schema parsing process using goroutines to improve efficiency and adds a bulk insertion for metadata. These changes enhance the performance and overall speed of the setup operation.

Disable 800 stream test cases due to memory issues

738592d

Commented out the test cases involving 800 streams as they cause memory starvation in t3.small instances. These tests significantly impact memory usage because they store the entire tree in memory.

Refactor schema parsing to use unbuffered channel

94b743b

Replaced the buffered result channel with an unbuffered one to manage schema parsing and initialization concurrently.

outerlook self-assigned this Sep 10, 2024

outerlook closed this Sep 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: lower memory footprint #543

fix: lower memory footprint #543

outerlook commented Sep 10, 2024

coderabbitai bot commented Sep 10, 2024

Review skipped

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

outerlook commented Sep 10, 2024

fix: lower memory footprint #543

fix: lower memory footprint #543

Conversation

outerlook commented Sep 10, 2024

Description

Related Problem

How Has This Been Tested?

coderabbitai bot commented Sep 10, 2024

Review skipped

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

outerlook commented Sep 10, 2024

CodeRabbit Configuration File (`.coderabbit.yaml`)