-
Notifications
You must be signed in to change notification settings - Fork 28.7k
[SPARK-52799][TESTS] Fix ThriftServerQueryTestSuite result comparison #51488
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you file a JIRA issue please, @yaooqinn ?
This reverts commit 16b2c03.
val splits = originalOut.split("\n") | ||
if (splits.length > rowCounts(i)) { | ||
// the result is multiline | ||
val step = splits.length / rowCounts(i) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assuming \n
s are added equivalently to each row according to the schema
Thank you @dongjoon-hyun, SPARK-52799 is filed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM just a minor nit.
if (splits.length > rowCounts(i)) { | ||
// the result is multiline | ||
val step = splits.length / rowCounts(i) | ||
splits.sliding(step, step).map(_.mkString("\n")).toSeq.sorted.mkString("\n") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.sliding(step, step)
-> .grouped(step)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @peter-toth
### What changes were proposed in this pull request? This PR fixes ThriftServerQueryTestSuite result comparison. When re-reading the Golden Files, if the result lines exceed the row size, we assume they contain multiple lines for a single row. In this case, we group these lines into rows first to avoid line-by-line sorting. ### Why are the changes needed? For a multiline result of a single row, it might get malformed, for example ``` [info] Expected "[ <birth>2018</birth> [info] <name>[45 61 73 6F 6E]</name> [info] <org>[4B 69 6E 64 65 72 67 61 72 74 65 6E 20 43 6F 70]</org> [info] </ROW> [info] <]ROW>", but got "[<ROW> [info] <name>[45 61 73 6F 6E]</name> [info] <birth>2018</birth> [info] <org>[4B 69 6E 64 65 72 67 61 72 74 65 6E 20 43 6F 70]</org> [info] </]ROW>" ``` ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Tested with #51470 locally ### Was this patch authored or co-authored using generative AI tooling? No Closes #51488 from yaooqinn/ThriftServerQueryTestSuite. Authored-by: Kent Yao <[email protected]> Signed-off-by: Kent Yao <[email protected]> (cherry picked from commit 9565d16) Signed-off-by: Kent Yao <[email protected]>
Merged to master and 4.0, thank you @dongjoon-hyun and @peter-toth for the review |
What changes were proposed in this pull request?
This PR fixes ThriftServerQueryTestSuite result comparison. When re-reading the Golden Files, if the result lines exceed the row size, we assume they contain multiple lines for a single row. In this case, we group these lines into rows first to avoid line-by-line sorting.
Why are the changes needed?
For a multiline result of a single row, it might get malformed, for example
Does this PR introduce any user-facing change?
No
How was this patch tested?
Tested with #51470 locally
Was this patch authored or co-authored using generative AI tooling?
No