fix(contracts): improve contracts and add tests #534

outerlook · 2024-09-05T14:35:54Z

Description

refactor tested stream structures for clearer analysis
also aggregate the csv file for results on the s3 bucket
update kwil to support array assignment (⚠️⚠️ will require release and tsn server update ⚠️⚠️)
make tests manually (with SQL) insert data otherwise it's too long for many streams inserts
improve both contracts
include complex composed tests (similar to what we have in markdown) to ensure the old behavior
needed to improve a lot the benchmark resiliency, during deployed tests it inconsistently failed with more than hours of running, making it very hard to debug

How Has This Been Tested?

To show the efficiency improvements of recent work, I've benchmarked and prepared some cool graphs

To fully understand our issues, see that our queries depend on some parameters that influences its time to respond. Mainly:

Branching Factor (aka child streams per stream) - ideal influence is O1
Days Queried (interval) - ideal influence is O*days
Qty of streams in the category - ideal influence is O*qty

Then, ideally the influence is Odaysqty. But this was not happening:

we got the cost per additional child (branch)
we got other costs harder to explain

(these ideals I got from my head, but happy if could be even shorter)

Branching Factor?

How many branches, at max, did it have per stream?

Examples:
Branching = 1

Branching = 2

Branching = 4

To tackle these issues, we began with #527 and now with this PR.

Comparing the 3 situations. (all calculated on my own machine, which think is faster than our servers)

Old = Before any improments
Optimized Index Change = after chore(contract): optimize get_index_change #527
New = with this PR

Real World Comparison

Querying 400 streams, 1 year, branching factor 5

Old: 25.895s
Optimized Index Change: 4.753s
New: 1.923s

(this difference grows a lot with branching factor increase)

Individual influences

Days influence

Stream Qty Influence

See that I got an error on the old for 400 streams

Branching Influence

See that the 2-branch for old is a bit odd, but anyway, I won't investigate that because it's not worth it)

Stacking influence

For this, I multiplied the days * branching factor and got the numbers. My rationale was that the time stacking is now different, and now, as branching factor doesn't influence anymore, the numbers are a lot smaller and doesn't always grow

Summary by CodeRabbit

New Features
- Enhanced EC2 instance role with CloudWatch monitoring capabilities.
- Export Results Lambda now supports saving results in both markdown and CSV formats.
- Introduced a standardized timeout constant for benchmarking processes.
- Improved benchmark workflow with better logging and timeout configuration.
Bug Fixes
- Enhanced error handling and logging for various functions to improve debugging.
Documentation
- Added comprehensive README for Kuneiform contracts to guide users on functionalities and usage.
Tests
- Introduced extensive tests for complex composed streams to ensure reliability and correctness.

Updated the benchmark setup to use tree structure for schemas and enhanced the benchmark case handling. Adjusted setup functions, created a `SetupSchemasInput`, modified results handling, and added more descriptive comments.

Extended the state machine timeout from 30 to 120 minutes to accommodate longer-running benchmarks. Added a comment in the runSingleTest function to clarify the query of the index-0 stream as the root stream.

Introduce a check in the benchmark setup to ensure that the tree's maximum depth does not exceed the PostgreSQL limitations, preventing potential errors. Added a new constant, maxDepth, set at 179, based on empirical findings.

Extended the state machine timeout from 2 hours to 6 hours to accommodate longer processing times. Adjusted task-specific timeouts and added a new polling interval to optimize the frequency of status checks during prolonged operations.

Changed the "timeoutSeconds" parameter to "executionTimeout" for better clarity. Also corrected the naming convention of the "TimeoutSeconds" constant to align with the updated AWS guideline.

Added default setting for LOG_RESULTS to true in TestBench to ensure results are logged unless specified otherwise. Modified benchmark.go to conditionally print results based on the LOG_RESULTS environment variable. Updated step_functions.go to explicitly set LOG_RESULTS to false when executing benchmark from the deployed environment.

Implement test cases to validate index change and YoY index calculations. Includes initialization, data insertion, and result conversion to ensure accuracy and coverage of edge cases.

Updated the benchmark workflow to introduce a `formatErrorState` pass state for better error formatting and handling. Replaced `Fail` state with a chain of `Pass` and `Fail` to ensure structured error information is passed upstream. Adjusted error catching and chaining to integrate with the new error handling structure.

Micro instances were causing errors and hangs during tests, hence they have been commented out from the list of tested EC2 instance types. Medium and large instance types have been added to ensure thorough benchmarking.

# Conflicts: # deployments/infra/stacks/benchmark/step_functions.go

Modified insertRecordsForPrimitive function to use bulk insert for faster database operations. The records are now batched into a single SQL insert statement, significantly improving performance by reducing the number of individual insert operations.

Implemented comprehensive unit tests for the NewTree function covering various scenarios such as different quantities of streams and branching factors. The tests also include checks for tree structure, node properties, and special cases for larger trees.

Added parameters `child_data_providers` and `child_stream_ids` to `get_raw_record` and `get_raw_index`. Updated the logic to buffer and emit ordered results, ensuring proper handling of data arrays' lengths and emitting results by date and taxonomy index sequentially.

Assigned default value 0 to the buffer length to prevent null errors during buffer length evaluation. This ensures the buffer dates are processed correctly and avoids unexpected termination due to null length assignment.

# Conflicts: # internal/benchmark/benchmark.go

…dures - Add checks for empty child taxonomies to prevent null upper bound errors - Improve buffer handling and array initialization to avoid potential issues - Refactor loop structure for better efficiency and correctness - Update comments and improve code readability

Refactored the logic for handling child data providers and stream IDs by removing unnecessary buffering and looping. This results in cleaner code that directly returns data values and indices in a simplified manner, ensuring proper ordering by date and taxonomy index.

Simplify the loop logic for processing taxonomies and emitting values by removing unnecessary steps and optimizing array handling. Introduce a new approach to handling array element removal and managing date-based value emission efficiently. This reduces code complexity and enhances maintainability.

The function now merges CSV files and saves both a markdown and a CSV file back to the results bucket. New code handles reading and uploading the merged CSV file to S3, ensuring both formats are available.

Added `github.com/pkg/errors` to wrap errors throughout the codebase, providing more context and improving the debugging process. This includes error wrapping in file operations, schema creation, metadata insertion, and benchmark runs.

Upgraded the kwil-db, kwil-db/core, and kwil-db/parse modules to their latest revisions in the go.mod and go.sum files. This ensures we are using the most current features and fixes provided by these libraries.

coderabbitai · 2024-09-05T14:36:02Z

Walkthrough

The changes encompass enhancements to the benchmarking and contract management functionalities within the codebase. Key updates include the addition of managed policies for EC2 roles, improved error handling, the introduction of constants for timeout management, and enhancements to the Export Results Lambda for handling multiple file formats. The schema setup process has been optimized for concurrency, and comprehensive tests have been added to validate new functionalities, ensuring robustness and reliability in data handling.

Changes

Files	Change Summary
`deployments/infra/stacks/benchmark/benchmark_stack.go`, `deployments/infra/stacks/benchmark/constants.go`, `deployments/infra/stacks/benchmark/lambdas/exportresults/main.go`, `deployments/infra/stacks/benchmark/step_functions.go`	Added managed policies, introduced a constants package for timeouts, enhanced Lambda function to save multiple file formats, and updated state machine timeout configuration.
`go.mod`	Updated dependency versions to reflect recent changes.
`internal/benchmark/benchmark.go`, `internal/benchmark/load_test.go`, `internal/benchmark/setup.go`, `internal/benchmark/trees/trees.go`, `internal/benchmark/utils.go`	Enhanced error handling, improved concurrency in schema setup, and modified function signatures for better result handling.
`internal/contracts/README.md`, `internal/contracts/composed_stream_template.kf`, `internal/contracts/primitive_stream_template.kf`, `internal/contracts/tests/complex_composed_test.go`	Added documentation, modified contract procedures for better data handling, and introduced a comprehensive suite of tests for complex composed streams.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Lambda
    participant S3
    participant CloudWatch

    User->>Lambda: Trigger Export Results
    Lambda->>S3: Retrieve CSV Files
    Lambda->>Lambda: Process Files
    Lambda->>S3: Save Markdown File
    Lambda->>S3: Save CSV File
    Lambda->>CloudWatch: Log Export Results

Assessment against linked issues

Objective	Addressed	Explanation
Find bottlenecks in benchmarking ( #530 )	✅
Optimize queries if quick win ( #530 )	✅
Disable certain parameters for future optimization (#530)	❌	No parameters were disabled in this PR.

🐇 In the meadow, where the bunnies play,
New changes hop in, brightening the day!
With policies added and errors wrapped tight,
Our code now dances, a delightful sight!
So let’s celebrate with a joyful cheer,
For the magic of coding, we hold so dear! 🌼✨

Tip

New features

Walkthrough comment now includes:

Possibly related PRs: A list of potentially related PRs to help you recall past context.
Suggested labels: CodeRabbit can now suggest labels by learning from your past PRs. You can also provide custom labeling instructions in the UI or configuration file.

Notes:

Please share any feedback in the discussion post on our Discord.
Possibly related PRs, automatic label suggestions based on past PRs, learnings, and possibly related issues require data opt-in (enabled by default).

Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai generate interesting stats about this repository and render them as a table.
- @coderabbitai show all the console.log statements in this repository.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

Ensure that the root node is correctly marked as a leaf when there is only one stream. This change returns the initialized tree immediately if the condition is met, optimizing the tree setup process.

Split function tests into groups of 10 to avoid exhausting Postgres during execution. Introduced a helper function `chunk` to divide tests, ensuring better test reliability and stability.

internal/contracts/primitive_stream_template.kf

Simplified the benchmark step by removing the retry logic in the script. The benchmark will now run just once without reattempting on failure.

Introduced a results channel for collecting benchmark results and improved test robustness with retry logic. Added logging to track benchmark execution and integrated a cleanup function to handle interruptions gracefully.

Introduce a README file for the `internal/contracts` directory, detailing the purpose and contents of the Kuneiform contracts used in the Truflation Stream Network (TSN). This includes descriptions of each contract file, synchronization practices, and links to additional resources.

Updated timeout handling to use a centralized constant in the benchmark state machine. This improves maintainability by defining `TotalTimeout` in a new constants file and referencing it across the code. Consequently, it ensures consistency and eases future modifications.

This update parallelizes the schema parsing process using goroutines to improve efficiency and adds a bulk insertion for metadata. These changes enhance the performance and overall speed of the setup operation.

Commented out the test cases involving 800 streams as they cause memory starvation in t3.small instances. These tests significantly impact memory usage because they store the entire tree in memory.

outerlook · 2024-09-09T10:06:27Z

Could finally make this work

long-running test issues were a mess to sort out

coderabbitai

Actionable comments posted: 11

Outside diff range, codebase verification and nitpick comments (7)

internal/benchmark/load_test.go (1)
Line range hint 28-184: Approve enhancements in TestBench with a suggestion for retry logic improvement.

The modifications to include context handling, results channel, and interrupt signal listening are well-implemented and enhance the robustness and manageability of the benchmark tests. The chunking of tests and retry logic are also beneficial for handling resource constraints and transient errors effectively.

However, consider implementing a more sophisticated exponential backoff mechanism rather than a simple multiplication by the attempt number, which could provide more effective handling of retries under high load or slow response scenarios.

Consider using a library or implementing a function for exponential backoff to improve the retry mechanism:
import "time"

// Exponential backoff calculation
func exponentialBackoff(attempt int) time.Duration {
    return time.Second * time.Duration(math.Pow(2, float64(attempt)))
}
Then replace the sleep call in the retry logic:
-time.Sleep(time.Second * time.Duration(attempt))
+time.Sleep(exponentialBackoff(attempt))
internal/benchmark/trees/trees_test.go (1)
39-203: Approve TestNewTree with a suggestion for enhanced test output clarity.

The TestNewTree function is well-structured and thoroughly tests various tree configurations, ensuring that the NewTree function behaves as expected across different scenarios. The use of a comparison function for structural integrity checks and detailed logging for debugging are commendable practices.

Consider enhancing the test output by including more descriptive messages in the log statements, especially when mismatches are detected, to aid in quicker identification of issues during test failures.

Enhance the clarity of test outputs with more descriptive log messages:
-if !tt.skipStructCheck && !compareTreeStructure(result, tt.expected) {
+if !tt.skipStructCheck && !compareTreeStructure(result, tt.expected) {
+   t.Logf("Mismatch in tree structure for test case: %s", tt.name)
    t.Errorf("NewTree() = %v, want %v", result, tt.expected)
}
internal/benchmark/setup.go (1)
Line range hint 163-196: Approve setVisibilityAndWhitelist with a suggestion for handling large metadata volumes.

The setVisibilityAndWhitelist function effectively sets visibility and whitelist settings for streams. The use of metadata insertion to configure these settings is appropriate and aligns with the requirements for managing access controls.

Consider implementing batch processing or throttling mechanisms when dealing with a large number of metadata entries, as this could prevent performance bottlenecks and improve the efficiency of metadata insertion.

Implement batch processing for metadata insertion to handle large volumes efficiently:
// Pseudocode for batch processing
batchSize := 100
for i := 0; i < len(metadataToInsert); i += batchSize {
    end := i + batchSize
    if end > len(metadataToInsert) {
        end = len(metadataToInsert)
    }
    batch := metadataToInsert[i:end]
    if err := insertMetadataBatch(ctx, platform, dbid, batch); err != nil {
        return err
    }
}
internal/benchmark/trees/trees.go (1)

39-78: Well-implemented breadth-first tree initialization.

The refactoring of the NewTree function to use a breadth-first approach is well-executed and should offer improved scalability and flexibility in handling different tree sizes and structures.

Consider adding more inline comments explaining the logic, especially around the queue operations and node initialization, to enhance readability and maintainability.
internal/benchmark/benchmark.go (2)
Line range hint 26-39: Enhanced error handling in benchmark execution.

The use of errors.Wrap to add context to error messages in the runBenchmark function is a good practice, especially in a complex benchmarking environment where understanding the source of errors is crucial.

Consider adding more specific error messages or logging additional context about the benchmark state when errors occur, to further aid in troubleshooting.

Also applies to: 113-113

Line range hint 95-122: Refactored result handling and improved logging in benchmark function.

The renaming and change in signature of getBenchmarFn to use a channel for passing benchmark results are well-thought-out, reflecting a shift towards more dynamic result handling. The addition of a logging statement enhances traceability.

However, there appears to be a typographical error in the function name (getBenchmarFn should be getBenchmarkFn). Correcting this would improve the code's professionalism and readability.
- func getBenchmarFn(benchmarkCase BenchmarkCase, resultCh *chan []Result) func(ctx context.Context, platform *kwilTesting.Platform) error {
+ func getBenchmarkFn(benchmarkCase BenchmarkCase, resultCh *chan []Result) func(ctx context.Context, platform *kwilTesting.Platform) error {
deployments/infra/stacks/benchmark/lambdas/exportresults/main.go (1)

Line range hint 24-177: Enhanced functionality with dual-format export capability.

The modifications to the HandleRequest function to support both markdown and CSV file exports are well-implemented, enhancing the Lambda's utility. The introduction of csvFilePath and the adjustments to file processing and uploading logic are appropriate and seem to be well-integrated into the existing flow.

Consider improving error handling by consolidating or simplifying error checks and enhancing logging to include more details about the processing steps and any issues encountered. This would help in troubleshooting and ensuring the reliability of the export process.

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between ac15d18 and 738592d.

Files ignored due to path filters (1)

go.sum is excluded by !**/*.sum

Files selected for processing (15)

deployments/infra/stacks/benchmark/benchmark_stack.go (1 hunks)
deployments/infra/stacks/benchmark/constants.go (1 hunks)
deployments/infra/stacks/benchmark/lambdas/exportresults/main.go (4 hunks)
deployments/infra/stacks/benchmark/step_functions.go (5 hunks)
go.mod (1 hunks)
internal/benchmark/benchmark.go (5 hunks)
internal/benchmark/load_test.go (3 hunks)
internal/benchmark/setup.go (6 hunks)
internal/benchmark/trees/trees.go (2 hunks)
internal/benchmark/trees/trees_test.go (1 hunks)
internal/benchmark/utils.go (6 hunks)
internal/contracts/README.md (1 hunks)
internal/contracts/composed_stream_template.kf (12 hunks)
internal/contracts/primitive_stream_template.kf (4 hunks)
internal/contracts/tests/complex_composed_test.go (1 hunks)

Files skipped from review due to trivial changes (2)

deployments/infra/stacks/benchmark/constants.go
go.mod

Additional context used

LanguageTool

internal/contracts/README.md

[uncategorized] ~7-~7: Loose punctuation mark.
Context: ...ontents - composed_stream_template.kf: Template for composed stream contracts ...

(UNLIKELY_OPENING_PUNCTUATION)

golangci-lint

internal/contracts/tests/complex_composed_test.go

367-367: Error return value of platform.Engine.CreateDataset is not checked

(errcheck)

426-426: Error return value of platform.Engine.CreateDataset is not checked

(errcheck)

GitHub Check: lint

internal/contracts/tests/complex_composed_test.go

[failure] 367-367:
Error return value of platform.Engine.CreateDataset is not checked (errcheck)

[failure] 426-426:
Error return value of platform.Engine.CreateDataset is not checked (errcheck)

Additional comments not posted (32)

internal/contracts/README.md (6)

1-4: Clear and effective introduction.

The header and introductory text set a clear context for the purpose of the contracts within the TSN.

5-11: Well-documented contents section.

The list of files and their descriptions provide a clear overview of what is included in the directory.

Tools

LanguageTool

[uncategorized] ~7-~7: Loose punctuation mark.
Context: ...ontents - composed_stream_template.kf: Template for composed stream contracts ...

(UNLIKELY_OPENING_PUNCTUATION)

12-20: Informative purpose section.

The description of the core functionalities covered by the contracts aligns well with the objectives of the TSN.

21-23: Clear synchronization strategy.

The explanation of how the contracts are kept in sync with the public versions is clear and crucial for maintaining consistency.

25-31: Helpful additional resources.

The links to detailed documentation and tools are beneficial for developers seeking more in-depth information.

7-7: Static analysis hint addressed: Punctuation is appropriate.

The punctuation used in the markdown list is appropriate for the format and does not require changes.

Tools

LanguageTool

[uncategorized] ~7-~7: Loose punctuation mark.
Context: ...ontents - composed_stream_template.kf: Template for composed stream contracts ...

(UNLIKELY_OPENING_PUNCTUATION)

internal/benchmark/utils.go (3)

52-54: Improved error handling in executeStreamProcedure.

The use of errors.Wrap to provide additional context on failure is a good practice, enhancing the maintainability and debuggability of the code.

101-101: Enhanced error handling in saveResults.

Wrapping the error when saving results helps in quickly identifying issues during file operations, which is crucial for operational reliability.

111-111: Robust error handling added to deleteFileIfExists.

The addition of errors.Wrap in file deletion operations provides valuable context, improving error traceability and supportability.

Also applies to: 119-119

internal/benchmark/load_test.go (1)

186-198: Approve the chunk function implementation.

The chunk function is well-implemented, correctly handling the division of slices into smaller chunks, including the edge case where the last chunk may be smaller than the specified size. This utility function is a valuable addition for managing large datasets or long-running processes in chunks.

internal/benchmark/trees/trees_test.go (2)

205-226: Approve the compareTreeStructure function implementation.

The compareTreeStructure function is well-implemented, providing a thorough comparison of two tree structures. It checks all relevant properties, including tree depth, quantity of streams, branching factor, and node details, ensuring a comprehensive validation of tree integrity.

228-249: Approve TestDisplayTree for enhanced visibility during testing.

The TestDisplayTree function is a useful addition to the test suite, providing a way to log the visual representation of trees with different configurations. This enhances the understandability and debuggability of the tree structures during development and testing.

internal/benchmark/setup.go (1)

Line range hint 115-131: Approve createAndInitializeSchema for robust schema management.

The createAndInitializeSchema function is well-implemented, handling the creation and initialization of schemas effectively. The use of errors.Wrap to provide detailed error messages enhances the debuggability of the function, making it easier to identify and resolve issues during schema setup.

internal/contracts/tests/complex_composed_test.go (9)

1-17: Imports and package declaration are appropriate.

The imports are correctly chosen for the functionality of testing and error handling in this file.

19-26: Variable declarations are suitable for testing purposes.

The use of dynamically generated stream IDs helps avoid conflicts in test environments.

28-41: Well-structured test setup for complex composed tests.

The use of kwilTesting.RunSchemaTest with multiple encapsulated sub-tests ensures thorough testing and maintainability.

59-96: Comprehensive testing of record retrieval with good error handling.

The function effectively tests record retrieval, uses assertions to validate results, and wraps errors to provide context, which are all best practices in testing.

98-135: Consistent testing approach for index retrieval.

The function uses a consistent testing pattern, which is appropriate for the type of test being conducted. Good use of assertions and error handling.

137-163: Proper testing of latest value retrieval.

The function is well-structured to test the retrieval of the latest value, with appropriate use of nil arguments to simulate this scenario.

165-191: Effective testing of edge case for empty date queries.

The function correctly handles and tests the scenario where a date query returns no records, which is an important edge case to cover in tests.

232-259: Proper handling and testing of out-of-range date queries.

The function tests an important scenario effectively, ensuring that out-of-range date queries are handled correctly.

261-279: Effective testing of invalid date input handling.

The function correctly tests the system's response to invalid date formats, which is essential for ensuring robustness.
internal/contracts/composed_stream_template.kf (4)
326-357: Review: Updated get_record to handle date ranges more effectively

The get_record procedure has been updated to handle date ranges more effectively, ensuring that values are emitted only within the specified range. This enhancement is important for accuracy and efficiency in data retrieval.

Correctness: The logic to handle date ranges and conditional emission of values is correctly implemented.

Performance: The procedure's performance is dependent on the efficiency of the get_record_filled call and the conditions within the loop. It's well-optimized for its purpose but keep an eye on potential bottlenecks.

Maintainability: The procedure is relatively straightforward and well-documented, making it easy to maintain.

This update is a positive change that enhances the functionality and reliability of the procedure.

367-398: Review: Enhanced get_index for robust handling of index data

get_index has been updated similarly to get_record, focusing on ensuring that index data is handled accurately within specified date ranges. This update is crucial for maintaining the integrity and accuracy of index calculations.

Correctness: The implementation is correct, with appropriate checks and conditions to ensure data accuracy.

Performance: As with get_record, the performance is largely dependent on the underlying data calls and should be monitored for potential issues.

Maintainability: The procedure is well-documented and straightforward, which aids in its maintainability.

The enhancements made to get_index are well-aligned with the PR's objectives and improve the system's overall reliability.

Line range hint 73-102: Review: Enhanced handling of multiple child data providers and stream IDs in get_raw_record

The modifications to get_raw_record introduce handling for multiple child data providers and stream IDs, which is a significant enhancement aligning with the PR's objectives to improve contracts. The addition of $child_data_providers and $child_stream_ids as parameters allows for more dynamic data handling.

Correctness: The function checks if the lengths of $child_data_providers and $child_stream_ids are equal, which is crucial for maintaining data integrity and preventing runtime errors.

Performance: Iterating over arrays and making external data calls within a loop could impact performance, especially with large arrays. Consider optimizing this part if performance issues arise.

Error Handling: Proper error messages are provided, which improves the robustness of the procedure.

Overall, the changes are well-implemented and enhance the functionality of the procedure. However, ensure that all external calls to this procedure are updated to pass the new parameters.

Verification successful

Verification Successful: Calls to get_raw_record Updated

The call to get_raw_record has been updated to include the new parameters, aligning with the modified procedure signature. This confirms that the necessary updates have been made to accommodate the changes.

The parameters $date_from, $date_to, $frozen_at, $child_data_providers, and $child_stream_id are correctly passed in the call.
Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Verify that all calls to `get_raw_record` have been updated to include the new parameters.
rg --type kf -A 5 $'get_raw_record'
Length of output: 61

Script:
#!/bin/bash
# Find all files with the .kf extension and search for calls to `get_raw_record`.
fd --extension kf --exec rg 'get_raw_record' {}
Length of output: 237
Line range hint 114-143: Review: Updated get_raw_index to support multiple data sources

Similar to get_raw_record, get_raw_index has been updated to handle multiple child data providers and stream IDs. This change is crucial for supporting more complex data structures and ensuring that the system can scale with additional data sources.

Correctness: The function includes necessary checks to ensure that the lengths of the input arrays match, which is essential for data consistency.

Performance: As with get_raw_record, the performance could be affected by the loop and external data calls. Monitoring and optimization may be needed depending on the use case.

Error Handling: The error handling is appropriate, with clear messages that aid in debugging and maintenance.

The modifications are in line with the PR's goals and provide the necessary functionality to support more complex data interactions.

Verification successful

Verification Successful: get_raw_index Call Updated

The call to get_raw_index in internal/contracts/composed_stream_template.kf has been updated to include the new parameters, aligning with the updated function signature. No other calls were found, suggesting this might be the only instance. Further manual verification is recommended if additional instances are expected.
Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Verify that all calls to `get_raw_index` have been updated to include the new parameters.
rg --type kf -A 5 $'get_raw_index'
Length of output: 60

Script:
#!/bin/bash
# Search for all calls to `get_raw_index` across all files to verify if they include the new parameters.
rg -A 5 'get_raw_index'
Length of output: 1103
deployments/infra/stacks/benchmark/benchmark_stack.go (1)

72-72: Approve addition of CloudWatchAgentServerPolicy, suggest verification of policy scope.

The addition of the CloudWatchAgentServerPolicy to the ec2InstanceRole is a positive change for enhancing monitoring capabilities. However, it's crucial to ensure that this policy does not grant overly broad permissions that could potentially be exploited.

Run the following script to verify the exact permissions granted by the CloudWatchAgentServerPolicy:

deployments/infra/stacks/benchmark/step_functions.go (3)

109-109: Approve dynamic timeout configuration.

The update to use a variable TotalTimeout for the state machine's timeout enhances flexibility and allows for dynamic configuration based on operational needs.

160-168: Approve creation of log group with retention policy.

The creation of a log group for command execution with a retention policy of three days and a removal policy set to destroy is a good practice for managing logs efficiently and securely.

185-190: Approve CloudWatch logging configuration, suggest verification of security settings.

The configuration to include CloudWatch logs for command execution is a positive step for enhancing observability. However, it's crucial to verify that this configuration does not expose sensitive information or logs.

Run the following script to verify the security settings of the CloudWatch logging configuration:

internal/contracts/primitive_stream_template.kf (2)

212-215: Approve explicit data type definitions for null values.

The changes to explicitly define the data types of null values in the get_metadata procedure enhance type safety and clarity. This is a good practice for ensuring consistency in data handling and error prevention.

Also applies to: 241-244

Line range hint 621-650: Approve addition of array length checks in get_index_change.

The inclusion of checks for the lengths of arrays before iterating over them in the get_index_change procedure is a crucial update for preventing runtime errors. This change enhances the robustness of the procedure by ensuring that operations on arrays are only performed when appropriate.

internal/benchmark/utils.go

internal/benchmark/load_test.go

internal/benchmark/setup.go

internal/contracts/tests/complex_composed_test.go

internal/contracts/composed_stream_template.kf

internal/benchmark/trees/trees.go

outerlook added 28 commits August 30, 2024 15:33

Refactor: Improve schema setup and benchmark case handling

f976f31

Updated the benchmark setup to use tree structure for schemas and enhanced the benchmark case handling. Adjusted setup functions, created a `SetupSchemasInput`, modified results handling, and added more descriptive comments.

Increase state machine timeout and add comment in benchmark

7ee6b2a

Extended the state machine timeout from 30 to 120 minutes to accommodate longer-running benchmarks. Added a comment in the runSingleTest function to clarify the query of the index-0 stream as the root stream.

Add depth check to prevent benchmarks from exceeding limits

f455cbe

Introduce a check in the benchmark setup to ensure that the tree's maximum depth does not exceed the PostgreSQL limitations, preventing potential errors. Added a new constant, maxDepth, set at 179, based on empirical findings.

Increase timeouts and adjust polling intervals

8f3945e

Extended the state machine timeout from 2 hours to 6 hours to accommodate longer processing times. Adjusted task-specific timeouts and added a new polling interval to optimize the frequency of status checks during prolonged operations.

Update execution timeout parameter in Step Functions

c804b27

Changed the "timeoutSeconds" parameter to "executionTimeout" for better clarity. Also corrected the naming convention of the "TimeoutSeconds" constant to align with the updated AWS guideline.

Merge branch 'fix/bench-timeout' into chore/idx-change-opt

2ff01af

Optimize get_index_change to remove nested loop

f05db59

Add index change tests for contract validation

9c4705e

Implement test cases to validate index change and YoY index calculations. Includes initialization, data insertion, and result conversion to ensure accuracy and coverage of edge cases.

Remove unsupported micro instances from benchmark types

17ab7ea

Micro instances were causing errors and hangs during tests, hence they have been commented out from the list of tested EC2 instance types. Medium and large instance types have been added to ensure thorough benchmarking.

Merge branch 'main' into chore/idx-change-opt

07fbeb3

# Conflicts: # deployments/infra/stacks/benchmark/step_functions.go

Optimize record insertion process

1ccd939

Modified insertRecordsForPrimitive function to use bulk insert for faster database operations. The records are now batched into a single SQL insert statement, significantly improving performance by reducing the number of individual insert operations.

fix null without types at streams template

8dbb1f0

Merge branch 'chore/idx-change-opt' into fix/bench-err

b21b1f4

Merge branch 'main' into fix/bench-err

2601f13

Assign default value to avoid null error in buffer handling

0be4679

Assigned default value 0 to the buffer length to prevent null errors during buffer length evaluation. This ensures the buffer dates are processed correctly and avoids unexpected termination due to null length assignment.

Merge branch 'main' into fix/unsuport-micro

9774350

# Conflicts: # internal/benchmark/benchmark.go

Add os package import to benchmark.go

a179725

Merge branch 'fix/unsuport-micro' into fix/bench-err

bd29a5f

Add CSV export alongside markdown in Export Results Lambda

a99cc4a

The function now merges CSV files and saves both a markdown and a CSV file back to the results bucket. New code handles reading and uploading the merged CSV file to S3, ensuring both formats are available.

Wrap errors with more context for better debugging

85c0bb7

Added `github.com/pkg/errors` to wrap errors throughout the codebase, providing more context and improving the debugging process. This includes error wrapping in file operations, schema creation, metadata insertion, and benchmark runs.

Update kwil-db dependencies to latest versions

8bb3ed3

Upgraded the kwil-db, kwil-db/core, and kwil-db/parse modules to their latest revisions in the go.mod and go.sum files. This ensures we are using the most current features and fixes provided by these libraries.

Merge branch 'refs/heads/main' into fix/bench-err

409bf35

outerlook added 2 commits September 6, 2024 15:26

Fix tree initialization for single stream scenario

c25128b

Ensure that the root node is correctly marked as a leaf when there is only one stream. This change returns the initialized tree immediately if the condition is met, optimizing the tree setup process.

Chunk long-running tests to prevent Postgres timeouts

7e63aa7

Split function tests into groups of 10 to avoid exhausting Postgres during execution. Introduced a helper function `chunk` to divide tests, ensuring better test reliability and stability.

MicBun reviewed Sep 6, 2024

View reviewed changes

internal/contracts/primitive_stream_template.kf Show resolved Hide resolved

MicBun mentioned this pull request Sep 6, 2024

Problem: Partner nodes are outdated #535

Closed

outerlook added 6 commits September 7, 2024 14:45

Remove retry logic from benchmark script

ffb277c

Simplified the benchmark step by removing the retry logic in the script. The benchmark will now run just once without reattempting on failure.

Refactor benchmark functions and add results handling.

4bec6ec

Introduced a results channel for collecting benchmark results and improved test robustness with retry logic. Added logging to track benchmark execution and integrated a cleanup function to handle interruptions gracefully.

Parallelize schema parsing and batch metadata insertion

908d12c

This update parallelizes the schema parsing process using goroutines to improve efficiency and adds a bulk insertion for metadata. These changes enhance the performance and overall speed of the setup operation.

Disable 800 stream test cases due to memory issues

738592d

Commented out the test cases involving 800 streams as they cause memory starvation in t3.small instances. These tests significantly impact memory usage because they store the entire tree in memory.

MicBun mentioned this pull request Sep 9, 2024

Problem: Contracts are not up to date #536

Closed

outerlook marked this pull request as ready for review September 9, 2024 10:07

outerlook requested a review from MicBun September 9, 2024 10:07

coderabbitai bot reviewed Sep 9, 2024

View reviewed changes

MicBun approved these changes Sep 9, 2024

View reviewed changes

MicBun merged commit 53dd668 into main Sep 9, 2024
8 checks passed

MicBun deleted the fix/bench-err branch September 9, 2024 10:51

This was referenced Sep 9, 2024

Goal: run load testing #441

Closed

fix: lower memory footprint #543

Closed

This was referenced Sep 14, 2024

fix: tests for contracts including a fix #567

Merged

feat: Enable StateSync on TSN servers #570

Merged

This was referenced Oct 18, 2024

chore: sync contract and benchmark with dynamic weight #677

Merged

test: get monetary value with dynamic weight #681

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(contracts): improve contracts and add tests #534

fix(contracts): improve contracts and add tests #534

outerlook commented Sep 5, 2024 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Sep 5, 2024 •

edited

Loading

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

outerlook commented Sep 9, 2024 •

edited

Loading

coderabbitai bot left a comment

fix(contracts): improve contracts and add tests #534

fix(contracts): improve contracts and add tests #534

Conversation

outerlook commented Sep 5, 2024 • edited by coderabbitai bot Loading

Description

Related Problem

How Has This Been Tested?

Real World Comparison

Days influence

Stream Qty Influence

Branching Influence

Stacking influence

Summary by CodeRabbit

coderabbitai bot commented Sep 5, 2024 • edited Loading

Walkthrough

Changes

Sequence Diagram(s)

Assessment against linked issues

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

outerlook commented Sep 9, 2024 • edited Loading

coderabbitai bot left a comment

Choose a reason for hiding this comment

outerlook commented Sep 5, 2024 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Sep 5, 2024 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)

outerlook commented Sep 9, 2024 •

edited

Loading