-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-53822][PYTHON][SS][TESTS]Add Python TransformWithState test case for composite/nested output schemas #52536
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
+471
−0
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…omposite output schema This commit adds a new test case `test_composite_output_schema` to verify that transformWithState correctly handles complex nested output schemas containing: - Primitive types (StringType) - Collections of primitives (ArrayType of StringType) - Maps of primitives (MapType of StringType to StringType) - Collections of composite types (ArrayType of StructType) - Maps of composite types (MapType of StringType to StructType) The test exercises both Pandas and Row-based stateful processors to ensure proper serialization/deserialization of composite output types.
…based assertions to match other test patterns
cd64f38
to
f569a6a
Compare
bogao007
reviewed
Oct 7, 2025
python/pyspark/sql/tests/pandas/test_pandas_transform_with_state.py
Outdated
Show resolved
Hide resolved
…h nested array and map fields Add arrayValue and mapValue fields to the inner nested class schema to test more complex composite type scenarios. This ensures transformWithState properly handles structs containing arrays and maps within arrays of structs and maps of structs. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
This reverts commit 58d440e.
f569a6a
to
ba311bb
Compare
change |
anishshri-db
approved these changes
Oct 9, 2025
Will merge once CI is green |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
Add a new test case for python transform_with_state APIs to ensure that output schemas containing nested structs are properly handled.
The output schema tested is reflected in the test code and copied below:
Why are the changes needed?
Missing test coverage: the existing test case for composite types
test_transform_with_state_in_pandas_composite_type
does check the state value schemas, but actual output is only handled as StringTypes.Does this PR introduce any user-facing change?
No
How was this patch tested?
Was this patch authored or co-authored using generative AI tooling?
Generated-by: Claude Code (claude-sonnet-4-5-20250929)