-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-53779][SQL][CONNECT] Implement transform()
in Column API
#52537
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-53779][SQL][CONNECT] Implement transform()
in Column API
#52537
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
We need to add this to PySpark, but it's ok to have a separate PR.
Co-authored-by: Ruifeng Zheng <[email protected]>
opened a new ticket for PySpark. https://issues.apache.org/jira/browse/SPARK-53841 |
sql/core/src/test/scala/org/apache/spark/sql/ColumnExpressionSuite.scala
Show resolved
Hide resolved
sql/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/ColumnTestSuite.scala
Show resolved
Hide resolved
Co-authored-by: Wenchen Fan <[email protected]>
sql/core/src/test/scala/org/apache/spark/sql/ColumnExpressionSuite.scala
Outdated
Show resolved
Hide resolved
val transformed = a.transform(triple).transform(c => c + 10) | ||
assert(transformed == ((a * 3) + 10)) | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about adding a test for nested transform? e.g. a.transform(_.transform(fn.upper))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added!
) | ||
} | ||
|
||
test("Column.transform: chaining") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.transform(trim).transform(upper)
is also chaining. Let's name them: Column.transform: built-in functions
and Column.transform: lambda functions
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed!
transform()
in Column API
thanks, merging to master! |
What changes were proposed in this pull request?
Add
transform()
API in Columns API, similar toDataset.transform()
:Why are the changes needed?
We want to give users a way to chain their methods, such as
This pattern is also easier for AI agents to learn and write.
Does this PR introduce any user-facing change?
Yes. New API is introduced.
How was this patch tested?
Unit tests.
Was this patch authored or co-authored using generative AI tooling?
Tests generated by Copilot.