[SPARK-53846][PYTHON][TESTS] Skip test_profile_pandas_* tests if pandas or pyarrow are unavailable

dongjoon-hyun · dongjoon-hyun · commit 18f0463a97bd · 2025-10-08T18:37:26.000-07:00
### What changes were proposed in this pull request? This PR aims to skip `test_profile_pandas_udf` and `test_profile_pandas_function_api` tests if `pandas` or `pyarrow` are unavailable like the other test cases, e.g., `test_memory_profiler_pandas_udf`. ``` $ git grep test_profile_pandas python/pyspark/tests/test_memory_profiler.py: def test_profile_pandas_udf(self): python/pyspark/tests/test_memory_profiler.py: def test_profile_pandas_function_api(self): ``` ### Why are the changes needed? We had better check the test requirements explicitly. In other words, PySpark unit tests should pass without those packages like the existing other unit test cases. https://github.com/apache/spark/blob/bf2457b6db77b911874a22e6d73f07793f44bef1/python/pyspark/tests/test_memory_profiler.py#L307-L311 ### Does this PR introduce _any_ user-facing change? No. This is a test change. ### How was this patch tested? Pass the CIs and manually test without `pyarrow`. ``` ... Tests passed in 159 seconds Skipped tests in pyspark.tests.test_memory_profiler with python3: test_memory_profiler_aggregate_in_pandas (pyspark.tests.test_memory_profiler.MemoryProfiler2Tests.test_memory_profiler_aggregate_in_pandas) ... skip (0.000s) test_memory_profiler_cogroup_apply_in_arrow (pyspark.tests.test_memory_profiler.MemoryProfiler2Tests.test_memory_profiler_cogroup_apply_in_arrow) ... skip (0.001s) test_memory_profiler_cogroup_apply_in_pandas (pyspark.tests.test_memory_profiler.MemoryProfiler2Tests.test_memory_profiler_cogroup_apply_in_pandas) ... skip (0.000s) test_memory_profiler_group_apply_in_arrow (pyspark.tests.test_memory_profiler.MemoryProfiler2Tests.test_memory_profiler_group_apply_in_arrow) ... skip (0.000s) test_memory_profiler_group_apply_in_pandas (pyspark.tests.test_memory_profiler.MemoryProfiler2Tests.test_memory_profiler_group_apply_in_pandas) ... skip (0.000s) test_memory_profiler_map_in_pandas_not_supported (pyspark.tests.test_memory_profiler.MemoryProfiler2Tests.test_memory_profiler_map_in_pandas_not_supported) ... skip (0.000s) test_memory_profiler_pandas_udf (pyspark.tests.test_memory_profiler.MemoryProfiler2Tests.test_memory_profiler_pandas_udf) ... skip (0.000s) test_memory_profiler_pandas_udf_iterator_not_supported (pyspark.tests.test_memory_profiler.MemoryProfiler2Tests.test_memory_profiler_pandas_udf_iterator_not_supported) ... skip (0.000s) test_memory_profiler_pandas_udf_window (pyspark.tests.test_memory_profiler.MemoryProfiler2Tests.test_memory_profiler_pandas_udf_window) ... skip (0.000s) test_memory_profiler_udf_with_arrow (pyspark.tests.test_memory_profiler.MemoryProfiler2Tests.test_memory_profiler_udf_with_arrow) ... skip (0.000s) test_profile_pandas_function_api (pyspark.tests.test_memory_profiler.MemoryProfilerTests.test_profile_pandas_function_api) ... skip (0.000s) test_profile_pandas_udf (pyspark.tests.test_memory_profiler.MemoryProfilerTests.test_profile_pandas_udf) ... skip (0.000s) ... ``` ### Was this patch authored or co-authored using generative AI tooling? No. Closes #52549 from dongjoon-hyun/SPARK-53846. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
diff --git a/python/pyspark/tests/test_memory_profiler.py b/python/pyspark/tests/test_memory_profiler.py
@@ -112,6 +112,10 @@ def test_memory_profiler(self):
             self.sc.dump_profiles(d)
             self.assertTrue(f"udf_{id}_memory.txt" in os.listdir(d))
 
+    @unittest.skipIf(
+        not have_pandas or not have_pyarrow,
+        cast(str, pandas_requirement_message or pyarrow_requirement_message),
+    )
     def test_profile_pandas_udf(self):
         udfs = [self.exec_pandas_udf_ser_to_ser, self.exec_pandas_udf_ser_to_scalar]
         udf_names = ["ser_to_ser", "ser_to_scalar"]
@@ -130,6 +134,10 @@ def test_profile_pandas_udf(self):
                 "Profiling UDFs with iterators input/output is not supported" in str(user_warns[0])
             )
 
+    @unittest.skipIf(
+        not have_pandas or not have_pyarrow,
+        cast(str, pandas_requirement_message or pyarrow_requirement_message),
+    )
     def test_profile_pandas_function_api(self):
         apis = [self.exec_grouped_map]
         f_names = ["grouped_map"]