Skip to content

Conversation

Yicong-Huang
Copy link
Contributor

@Yicong-Huang Yicong-Huang commented Oct 7, 2025

What changes were proposed in this pull request?

Add transform() API in Columns API, similar to Dataset.transform():

def transform(f: Column => Column): Column

Why are the changes needed?

We want to give users a way to chain their methods, such as

df.select($"fruit"
  .transform(addPrefix)
  .transform(uppercase)
)

This pattern is also easier for AI agents to learn and write.

Does this PR introduce any user-facing change?

Yes. New API is introduced.

How was this patch tested?

Unit tests.

Was this patch authored or co-authored using generative AI tooling?

Tests generated by Copilot.

@github-actions github-actions bot added the SQL label Oct 7, 2025
@Yicong-Huang Yicong-Huang changed the title feat: implement transform() in Column API [SPARK-53779] Implement transform() in Column API Oct 7, 2025
@Yicong-Huang Yicong-Huang changed the title [SPARK-53779] Implement transform() in Column API [SPARK-53779][SQL] Implement transform() in Column API Oct 7, 2025
Copy link
Member

@ueshin ueshin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.
We need to add this to PySpark, but it's ok to have a separate PR.

@Yicong-Huang Yicong-Huang changed the title [SPARK-53779][SQL] Implement transform() in Column API [SPARK-53779][SQL][CONNECT] Implement transform() in Column API Oct 8, 2025
@Yicong-Huang
Copy link
Contributor Author

LGTM. We need to add this to PySpark, but it's ok to have a separate PR.

opened a new ticket for PySpark. https://issues.apache.org/jira/browse/SPARK-53841

val transformed = a.transform(triple).transform(c => c + 10)
assert(transformed == ((a * 3) + 10))
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about adding a test for nested transform? e.g. a.transform(_.transform(fn.upper))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added!

)
}

test("Column.transform: chaining") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.transform(trim).transform(upper) is also chaining. Let's name them: Column.transform: built-in functions and Column.transform: lambda functions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed!

@Yicong-Huang Yicong-Huang requested a review from cloud-fan October 9, 2025 17:58
@Yicong-Huang Yicong-Huang changed the title [SPARK-53779][SQL][CONNECT] Implement transform() in Column API [SPARK-53779][SQL][CONNECT] Implement transform() in Column API Oct 9, 2025
@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in 6032a40 Oct 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants