Skip to content

Conversation

jiateoh
Copy link
Contributor

@jiateoh jiateoh commented Oct 7, 2025

What changes were proposed in this pull request?

Add a new test case for python transform_with_state APIs to ensure that output schemas containing nested structs are properly handled.

The output schema tested is reflected in the test code and copied below:


        # Define the output schema with inner nested class schema
        inner_nested_class_schema = StructType(
            [
                StructField("intValue", IntegerType(), True),
                StructField("doubleValue", DoubleType(), True),
                StructField("arrayValue", ArrayType(StringType()), True),
                StructField("mapValue", MapType(StringType(), StringType()), True),
            ]
        )

        output_schema = StructType(
            [
                StructField("primitiveValue", StringType(), True),
                StructField("listOfPrimitive", ArrayType(StringType()), True),
                StructField("mapOfPrimitive", MapType(StringType(), StringType()), True),
                StructField("listOfComposite", ArrayType(inner_nested_class_schema), True),
                StructField(
                    "mapOfComposite", MapType(StringType(), inner_nested_class_schema), True
                ),
            ]
        )

Why are the changes needed?

Missing test coverage: the existing test case for composite types test_transform_with_state_in_pandas_composite_type does check the state value schemas, but actual output is only handled as StringTypes.

Does this PR introduce any user-facing change?

No

How was this patch tested?

build/sbt -Phive -Phive-thriftserver -DskipTests package
python/run-tests --testnames 'pyspark.sql.tests.pandas.test_pandas_transform_with_state TransformWithStateInPandasTests'
python/run-tests --testnames 'pyspark.sql.tests.pandas.test_pandas_transform_with_state TransformWithStateInPySparkTests'

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code (claude-sonnet-4-5-20250929)

…omposite output schema

This commit adds a new test case `test_composite_output_schema` to verify that transformWithState correctly handles complex nested output schemas containing:
- Primitive types (StringType)
- Collections of primitives (ArrayType of StringType)
- Maps of primitives (MapType of StringType to StringType)
- Collections of composite types (ArrayType of StructType)
- Maps of composite types (MapType of StringType to StructType)

The test exercises both Pandas and Row-based stateful processors to ensure proper serialization/deserialization of composite output types.
@jiateoh jiateoh changed the title [WIP][SPARK-XXXXX][PS][SS][TESTS]Add Python TransformWithState test case for composite output schemas [WIP][SPARK-53822][PS][SS][TESTS]Add Python TransformWithState test case for composite output schemas Oct 7, 2025
@jiateoh jiateoh changed the title [WIP][SPARK-53822][PS][SS][TESTS]Add Python TransformWithState test case for composite output schemas [SPARK-53822][PS][SS][TESTS]Add Python TransformWithState test case for composite output schemas Oct 7, 2025
@jiateoh jiateoh changed the title [SPARK-53822][PS][SS][TESTS]Add Python TransformWithState test case for composite output schemas [WIP][SPARK-53822][PS][SS][TESTS]Add Python TransformWithState test case for composite output schemas Oct 7, 2025
@jiateoh jiateoh force-pushed the tws_python_composite_output branch from cd64f38 to f569a6a Compare October 7, 2025 18:52
@jiateoh jiateoh changed the title [WIP][SPARK-53822][PS][SS][TESTS]Add Python TransformWithState test case for composite output schemas [WIP][SPARK-53822][PS][SS][TESTS]Add Python TransformWithState test case for composite/nested output schemas Oct 7, 2025
@jiateoh jiateoh changed the title [WIP][SPARK-53822][PS][SS][TESTS]Add Python TransformWithState test case for composite/nested output schemas [SPARK-53822][PS][SS][TESTS]Add Python TransformWithState test case for composite/nested output schemas Oct 7, 2025
@jiateoh jiateoh marked this pull request as ready for review October 7, 2025 19:09
jiateoh and others added 4 commits October 7, 2025 14:45
…h nested array and map fields

Add arrayValue and mapValue fields to the inner nested class schema to test more complex composite type scenarios. This ensures transformWithState properly handles structs containing arrays and maps within arrays of structs and maps of structs.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
@jiateoh jiateoh force-pushed the tws_python_composite_output branch from f569a6a to ba311bb Compare October 7, 2025 22:04
@zhengruifeng zhengruifeng changed the title [SPARK-53822][PS][SS][TESTS]Add Python TransformWithState test case for composite/nested output schemas [SPARK-53822][PYTHON][SS][TESTS]Add Python TransformWithState test case for composite/nested output schemas Oct 8, 2025
@zhengruifeng
Copy link
Contributor

change [PS] to [PYTHON] since PS stands for Pandas-API-on-Spark

@anishshri-db
Copy link
Contributor

Will merge once CI is green

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants