-
Notifications
You must be signed in to change notification settings - Fork 334
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(flow): fix call df func bug&sqlness test #4165
Conversation
a7994e3
to
98c1270
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #4165 +/- ##
==========================================
- Coverage 85.06% 84.81% -0.25%
==========================================
Files 1028 1028
Lines 180477 180845 +368
==========================================
- Hits 153514 153383 -131
- Misses 26963 27462 +499 |
73ecc0d
to
aad2cae
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great! Finally the flow has the black box test.
tests/cases/standalone/common/types/timestamp/incorrect_timestamp.result
Outdated
Show resolved
Hide resolved
Warning Rate limit exceeded@discord9 has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 10 minutes and 19 seconds before requesting another review. How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. WalkthroughRecent changes enhance the Changes
This summary encapsulates the file-specific changes and includes a whimsical poem to celebrate the improvements. Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (invoked as PR comments)
Additionally, you can add CodeRabbit Configration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
Outside diff range and nitpick comments (2)
src/flow/src/adapter.rs (1)
Line range hint
324-347
: Refactor to handle potentialNone
values more gracefullyThe method
send_writeback_requests
constructs column names by unwrapping optional values directly, which could lead to runtime panics if aNone
value is encountered unexpectedly. Consider refactoring this to handleNone
values more gracefully, perhaps by providing a default column name or logging a warning.This change would enhance the robustness of the data flow management logic.
- .unwrap_or_else(|| format!("col_{i}")) + .unwrap_or_else(|| { + log::warn!("Column name missing for index {i}, using default name."); + format!("col_{i}") + })src/flow/src/transform/expr.rs (1)
102-102
: Clarify documentation forproto_col
function.The comment for the
proto_col
function could be more descriptive. It currently states that it returns an expression wrapped in aFunctionArgument
, but it could also explain that it constructs a column reference based on the index provided.
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files ignored due to path filters (1)
Cargo.lock
is excluded by!**/*.lock
Files selected for processing (12)
- src/flow/src/adapter.rs (2 hunks)
- src/flow/src/adapter/table_source.rs (1 hunks)
- src/flow/src/expr/scalar.rs (8 hunks)
- src/flow/src/repr/relation.rs (5 hunks)
- src/flow/src/transform/aggr.rs (10 hunks)
- src/flow/src/transform/expr.rs (8 hunks)
- src/flow/src/transform/plan.rs (2 hunks)
- tests/cases/standalone/flow/basic.result (1 hunks)
- tests/cases/standalone/flow/basic.sql (1 hunks)
- tests/cases/standalone/flow/df_func.result (1 hunks)
- tests/cases/standalone/flow/df_func.sql (1 hunks)
- tests/runner/Cargo.toml (1 hunks)
Files not reviewed due to errors (1)
- tests/cases/standalone/flow/basic.result (no review received)
Files skipped from review due to trivial changes (3)
- src/flow/src/adapter/table_source.rs
- tests/cases/standalone/flow/basic.sql
- tests/runner/Cargo.toml
Additional comments not posted (16)
tests/cases/standalone/flow/df_func.sql (1)
1-11
: SQL Test Script Review: Validating DataFrame FunctionsThis script tests the DataFrame functions effectively, particularly focusing on the behavior of nested functions like
abs(sum(number))
. The script also tests the idempotency of the flow creation and data manipulation by recreating the tables and flows, ensuring that the results are consistent.Also applies to: 13-16, 21-24, 29-31, 33-43, 45-48, 53-56, 61-63
tests/cases/standalone/flow/df_func.result (1)
1-13
: SQL Test Results Review: Accurate and ConsistentThe test results are consistent with the expected outcomes based on the SQL operations performed. The script effectively captures the results of window functions and aggregation on modified data, ensuring that the functionality works as intended.
[APROVED]
Also applies to: 17-20, 25-31, 33-36, 41-47, 50-58, 62-74, 78-81, 86-92, 94-97, 102-108, 111-119
src/flow/src/transform/plan.rs (1)
15-23
: Rust Source Code Review: Comprehensive Handling of Substrait PlansThe implementation covers a broad range of scenarios for converting Substrait plans to the internal representation. The use of pattern matching and error handling is robust, ensuring that the system can gracefully handle unsupported or malformed inputs. The detailed comments and structured error handling improve maintainability and readability.
Also applies to: 67-132, 134-151, 153-211, 213-224
src/flow/src/repr/relation.rs (2)
101-102
: Enhancements toRelationType
to support auto columnsThe addition of
auto_columns
to theRelationType
struct and related methods enhances the handling of columns that are automatically added by the flow but not present in the original SQL. This is a good use of encapsulation and helps keep track of such columns, which can be critical in complex transformations and aggregations.However, consider adding more detailed inline documentation explaining the purpose and use cases of
auto_columns
to help future maintainers understand the context and functionality quickly.Also applies to: 106-109, 166-175
358-366
: Ensure consistent length between types and names inRelationDesc
The addition of length checks in the
len
method ofRelationDesc
is a critical safety feature, ensuring that the number of types matches the number of names. This prevents potential runtime errors from index mismatches.This change is well-implemented and should help maintain data integrity throughout the application.
src/flow/src/transform/expr.rs (1)
285-285
: Ensure error handling is comprehensive.The function call to
from_substrait_to_datafusion_scalar_func
seems to correctly propagate errors (line 285), but ensure that all possible error scenarios are handled, especially considering the complexity of the transformations being performed.src/flow/src/expr/scalar.rs (4)
158-161
: Enhanced support for DataFrame scalar functionsThe addition of the
CallDf
variant in theScalarExpr
enum allows handling DataFrame-specific scalar functions, which is a significant enhancement for expression evaluation flexibility. The detailed invariant comment helps maintain clarity about argument handling within these functions.Also applies to: 375-376
194-201
: New method for constructingDfScalarFunction
from raw functionThe introduction of
try_from_raw_fn
in theDfScalarFunction
struct simplifies the creation of function instances from raw definitions. This method encapsulates the logic for initializingDfScalarFunction
objects, enhancing modularity and maintainability. The update to the deserialization process to use this new method is a good practice, ensuring consistency in object creation.Also applies to: 278-278
284-289
: Enhancements toRawDfScalarFn
handlingThe updates to the
RawDfScalarFn
struct, including more detailed comments and the addition of a method to get the function implementation, are positive changes that enhance the code's self-documentation and functionality. The methodfrom_proto
now properly handles the encoding of scalar functions, which is crucial for system integrity.Also applies to: 641-646
892-901
: Addition of tests forDfScalarFunction
The new tests for
DfScalarFunction
are well-designed to ensure that the serialization, deserialization, and evaluation functionalities work as expected. These tests are crucial for maintaining the robustness of the system as changes are made.src/flow/src/transform/aggr.rs (6)
388-388
: Marking auto columns effectively.The method
with_autos
is being used to mark auto-generated columns, which is a good practice for clarity and future maintenance. Ensure that the implementation ofwith_autos
correctly handles these markings across different modules.
464-758
: Comprehensive testing for aggregation logic.The tests added are comprehensive and cover various scenarios including basic aggregation, expression trees, and composite keys. It's good to see thorough testing, especially for complex aggregation logic. Ensure that these tests are integrated into your continuous integration pipeline to catch any regressions or errors early.
Line range hint
868-898
: Auto columns handling in aggregation tests.The handling of auto columns in the test cases is consistent with the changes in the main codebase. This consistency is crucial for maintaining the integrity of the system's behavior across different environments and scenarios.
Line range hint
931-1002
: Correct implementation of time window and auto columns in SQL parsing.The implementation of time windows and auto columns in the SQL parsing tests appears to be correct. The use of
tumble
window functions and the correct marking of auto columns are particularly noteworthy. Ensure that the parsing logic aligns with the SQL standards and other parts of the system.
Line range hint
1047-1118
: Handling of time windows in aggregation.The handling of time windows in the aggregation process is implemented correctly. The use of the
TumbleWindowFloor
andTumbleWindowCeiling
functions within the key plan setup is a good practice for managing time-based grouping in streaming data.
Line range hint
351-367
: Ensure proper handling of auto columns and window functions.The logic for determining column names and handling auto columns seems to be correct. However, ensure that the handling of
window_start
andwindow_end
as auto columns is consistent with the rest of the system and that these columns are appropriately indexed or used in downstream processes.Verification successful
Ensure proper handling of auto columns and window functions
The handling of
window_start
andwindow_end
as auto columns appears consistent in the provided context. They are referenced appropriately across multiple files, indicating that their auto column status is respected and they are used correctly in downstream processes.
src/flow/src/transform/aggr.rs
: Multiple references towindow_start
andwindow_end
indicate consistent handling within the aggregation transformations.src/flow/src/transform/plan.rs
: Comment confirms the auto addition ofwindow_start
andwindow_end
columns to output rows.src/flow/src/compute/render/reduce.rs
: Reference to expected output columns, includingwindow_start
andwindow_end
.src/flow/src/expr/func.rs
: Functions and tests related towindow_start
andwindow_end
calculations.No issues found with the handling of
window_start
andwindow_end
as auto columns.
Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify that the handling of auto columns and window functions is consistent across the system. # Test: Search for usage of `window_start` and `window_end` across other relevant files. rg --type rust $'window_start|window_end'Length of output: 3120
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (3)
- src/flow/src/adapter/worker.rs (2 hunks)
- src/flow/src/expr/scalar.rs (9 hunks)
- src/flow/src/transform/expr.rs (8 hunks)
Files skipped from review as they are similar to previous changes (2)
- src/flow/src/expr/scalar.rs
- src/flow/src/transform/expr.rs
Additional comments not posted (1)
src/flow/src/adapter/worker.rs (1)
209-216
: Improved shutdown logic inWorkerHandle
'sDrop
implementation.The changes made to the
Drop
implementation ofWorkerHandle
are crucial for ensuring that the worker is properly shut down when the handle is dropped. Usingfutures::executor::block_on
to synchronously wait for theshutdown
method's completion is a good practice in this context, as it ensures that no resources are left hanging. Additionally, the error handling and logging provide clear feedback if the shutdown process encounters issues, which is essential for maintaining system stability.
ad53c6d
to
9996b1b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Outside diff range and nitpick comments (1)
tests/cases/standalone/flow/df_func.sql (1)
1-63
: Ensure proper formatting and structure in the SQL file.Check that the SQL file is well-formatted, with consistent indentation and spacing, and that it follows best practices for SQL script layout.
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files ignored due to path filters (1)
Cargo.lock
is excluded by!**/*.lock
Files selected for processing (13)
- src/flow/src/adapter.rs (2 hunks)
- src/flow/src/adapter/table_source.rs (1 hunks)
- src/flow/src/adapter/worker.rs (2 hunks)
- src/flow/src/expr/scalar.rs (9 hunks)
- src/flow/src/repr/relation.rs (5 hunks)
- src/flow/src/transform/aggr.rs (10 hunks)
- src/flow/src/transform/expr.rs (8 hunks)
- src/flow/src/transform/plan.rs (2 hunks)
- tests/cases/standalone/flow/basic.result (1 hunks)
- tests/cases/standalone/flow/basic.sql (1 hunks)
- tests/cases/standalone/flow/df_func.result (1 hunks)
- tests/cases/standalone/flow/df_func.sql (1 hunks)
- tests/runner/Cargo.toml (1 hunks)
Files skipped from review due to trivial changes (1)
- src/flow/src/adapter.rs
Files skipped from review as they are similar to previous changes (10)
- src/flow/src/adapter/table_source.rs
- src/flow/src/adapter/worker.rs
- src/flow/src/expr/scalar.rs
- src/flow/src/repr/relation.rs
- src/flow/src/transform/aggr.rs
- src/flow/src/transform/expr.rs
- src/flow/src/transform/plan.rs
- tests/cases/standalone/flow/basic.result
- tests/cases/standalone/flow/basic.sql
- tests/runner/Cargo.toml
Additional comments not posted (6)
tests/cases/standalone/flow/df_func.sql (5)
19-19
: Ensure output column naming is consistent.The output columns
col_0, window_start, window_end
should be consistently named across different queries and aligned with the expected output specifications.Also applies to: 51-51, 85-85, 102-102
Verification successful
Output column naming is consistent.
The column names
col_0, window_start, window_end
are consistently used across different queries, as shown in the following locations:
tests/cases/standalone/flow/basic.sql
tests/cases/standalone/flow/df_func.sql
Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify consistency of output column names across different queries. # Test: Search for output column names in queries. Expect: Consistent naming and usage. rg --type sql $'SELECT col_0'Length of output: 640
29-31
: Validate the cleanup process for flows and tables.Ensure that the
DROP
statements for flows and tables are executed correctly and that there are no residual data or schema inconsistencies after these operations.Also applies to: 61-63, 111-119
13-16
: Validate data types and values in INSERT statements.Ensure that the data types and values used in the
INSERT
statements are valid and align with the table schema, particularly when negative values are being inserted for columns that might expect non-negative values.Also applies to: 45-48, 78-81, 94-97
1-6
: Ensure consistent TIMESTAMP usage.The
TIMESTAMP
data type is used with the default valueCURRENT_TIMESTAMP
. It's crucial to ensure that this behavior aligns with the expected use case, especially considering the precision and timezone handling.Also applies to: 33-38, 62-67
Verification successful
Ensure consistent TIMESTAMP usage.
The
TIMESTAMP
data type withDEFAULT CURRENT_TIMESTAMP
is used consistently across the codebase. The behavior aligns with the expected use case, ensuring uniform precision and timezone handling.
- Verified instances include:
tests/cases/standalone/flow/df_func.sql
tests/cases/standalone/flow/basic.sql
tests/cases/standalone/show_create_flow.sql
tests/cases/standalone/common/alter/alter_table_default.sql
tests/cases/standalone/common/show/show_create.sql
Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify TIMESTAMP handling in other parts of the codebase. # Test: Search for TIMESTAMP usage. Expect: Consistency in handling TIMESTAMP across the codebase. rg --type sql $'TIMESTAMP'Length of output: 25242
18-18
: Check the purpose and impact of SQLNESS SLEEP.Verify the necessity and impact of the
SQLNESS SLEEP
operations within the test script, ensuring they are used appropriately for simulating real-world scenarios.Also applies to: 26-26, 50-50, 58-58, 84-84, 101-101
Verification successful
The
SQLNESS SLEEP
operations are justified and necessary for simulating real-world scenarios.
- The sleep operations introduce a delay between data insertion and querying, ensuring the system can handle real-time data processing.
Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Assess the usage and impact of SQLNESS SLEEP in test scenarios. # Test: Search for SQLNESS SLEEP usage. Expect: Justified and minimal impact on test outcomes. rg --type sql $'SQLNESS SLEEP'Length of output: 387
Script:
#!/bin/bash # Extract surrounding lines of SQLNESS SLEEP operations for context analysis rg --context=5 --type sql $'SQLNESS SLEEP' tests/cases/standalone/flow/df_func.sql tests/cases/standalone/flow/basic.sqlLength of output: 3823
tests/cases/standalone/flow/df_func.result (1)
1-122
: Validate the expected results for consistency and accuracy.Ensure that the expected results documented in the
.result
file accurately reflect the outcomes of the SQL operations and are consistent with the data inputs and operations described.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (1)
- src/flow/src/adapter/worker.rs (3 hunks)
Files skipped from review as they are similar to previous changes (1)
- src/flow/src/adapter/worker.rs
I hereby agree to the terms of the GreptimeDB CLA.
Refer to a related PR or issue link (optional)
What's changed and what's your intention?
fix a bug when calling df func like
abs
while nested some flow's own function likeabs(sum(number))
, and also added some sqlness testsPlease explain IN DETAIL what the changes are in this PR and why they are needed:
abs(sum(number))
Checklist
Summary by CodeRabbit
New Features
Bug Fixes
Improvements
WorkerHandle
shutdown process with better error handling and logging.auto_columns
to signify non-auto columns in table schemas.Dependency Updates
sqlness
dependency version from0.5
to0.6.1
.