Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable CI Test on Scala 2.13 and support custom or spark-core extracted Scala version for Spark's engine #5196

Closed
wants to merge 2 commits into from

Conversation

bowenliang123
Copy link
Contributor

@bowenliang123 bowenliang123 commented Aug 23, 2023

Why are the changes needed?

  • enable CI test on Scala-2.13 for all modules except Flink SQL engine
  • For testing, choose available Spark engine home in download module by SCALA_COMPILE_VERSION of Kyuubi server
  • Choose the Scala version of Spark engine main resource Jar in the following order:
    1. SPARK_SCALA_VERSION system env
    2. Extract Scala version from Spark home's spark-core jar filename
  • Fixed 1 assertion error of kyuubi-spark-lineage module, as Spark on Scala 2.12 and 2.13 show different order of column linage output in MergeIntoTable ut
SparkSQLLineageParserHelperSuite:
- columns lineage extract - MergeIntoTable *** FAILED ***
  inputTables(List(v2_catalog.db.source_t))
  outputTables(List(v2_catalog.db.target_t))
  columnLineage(List(ColumnLineage(v2_catalog.db.target_t.name,Set(v2_catalog.db.source_t.name)), ColumnLineage(v2_catalog.db.target_t.price,Set(v2_catalog.db.source_t.price)), ColumnLineage(v2_catalog.db.target_t.id,Set(v2_catalog.db.source_t.id)))) did not equal inputTables(List(v2_catalog.db.source_t))
  outputTables(List(v2_catalog.db.target_t))
  columnLineage(List(ColumnLineage(v2_catalog.db.target_t.id,Set(v2_catalog.db.source_t.id)), ColumnLineage(v2_catalog.db.target_t.name,Set(v2_catalog.db.source_t.name)), ColumnLineage(v2_catalog.db.target_t.price,Set(v2_catalog.db.source_t.price)))) (SparkSQLLineageParserHelperSuite.scala:182)
  • Fixed other tests relying on Scala scripting results

How was this patch tested?

  • Add some test cases that check the changes thoroughly including negative and positive cases if possible

  • Add screenshots for manual tests if appropriate

  • Run test locally before make a pull request

Was this patch authored or co-authored using generative AI tooling?

@bowenliang123
Copy link
Contributor Author

bowenliang123 commented Aug 25, 2023

  • Fixed 1 remaining error in kyuubi-spark-lineage
SparkSQLLineageParserHelperSuite:
- columns lineage extract - MergeIntoTable *** FAILED ***
  inputTables(List(v2_catalog.db.source_t))
  outputTables(List(v2_catalog.db.target_t))
  columnLineage(List(ColumnLineage(v2_catalog.db.target_t.name,Set(v2_catalog.db.source_t.name)), ColumnLineage(v2_catalog.db.target_t.price,Set(v2_catalog.db.source_t.price)), ColumnLineage(v2_catalog.db.target_t.id,Set(v2_catalog.db.source_t.id)))) did not equal inputTables(List(v2_catalog.db.source_t))
  outputTables(List(v2_catalog.db.target_t))
  columnLineage(List(ColumnLineage(v2_catalog.db.target_t.id,Set(v2_catalog.db.source_t.id)), ColumnLineage(v2_catalog.db.target_t.name,Set(v2_catalog.db.source_t.name)), ColumnLineage(v2_catalog.db.target_t.price,Set(v2_catalog.db.source_t.price)))) (SparkSQLLineageParserHelperSuite.scala:182)

@bowenliang123
Copy link
Contributor Author

bowenliang123 commented Aug 25, 2023

Curiously, columns lineage extract - MergeIntoTable case has different order of column lineages result

  • on Scala 2.13, name, price, id (strictly matched with the order in code, as matchedActions first and than notMatchedActions)
columnLineage(List(ColumnLineage(v2_catalog.db.target_t.name,Set(v2_catalog.db.source_t.name)), ColumnLineage(v2_catalog.db.target_t.price,Set(v2_catalog.db.source_t.price)), ColumnLineage(v2_catalog.db.target_t.id,Set(v2_catalog.db.source_t.id)))
  • on Scala 2.12, id, name, price
  columnLineage(List(ColumnLineage(v2_catalog.db.target_t.id,Set(v2_catalog.db.source_t.id)), ColumnLineage(v2_catalog.db.target_t.name,Set(v2_catalog.db.source_t.name)), ColumnLineage(v2_catalog.db.target_t.price,Set(v2_catalog.db.source_t.price))))

Fixing this by ignoring the output order of column lineages, for the reasons, 1. order column is guaranteed not strictly stable in the logical plan (as long as it's aligned with its children plan), 2. no major impact on end users of lineage plugins

@@ -62,7 +63,7 @@ trait ProcBuilder {
def mainResource: Option[String] = {
// 1. get the main resource jar for user specified config first
// TODO use SPARK_SCALA_VERSION instead of SCALA_COMPILE_VERSION
val jarName = s"${module}_$SCALA_COMPILE_VERSION-$KYUUBI_VERSION.jar"
val jarName = s"${module}_$scalaRuntimeSemanticVersion-$KYUUBI_VERSION.jar"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The engine's Scala runtime version is independent of the server, e.g. it allows building a server with Scala 2.12, but using Spark with Scala 2.13.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problem, this change is intended to pass the tests on the same scala version first.
e.g. Server on Scala 2.13+ Spark on Scala 2.13

Copy link
Contributor Author

@bowenliang123 bowenliang123 Aug 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the dynamic combinations, we could do that in follow-ups.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The quick idea in my mind is extracting the SPARK_SCALA_VERSION from $SPARK_HOME/jars/spark-core_{scala_binary_version}-{spark_version}.jar

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adopted.

@codecov-commenter
Copy link

codecov-commenter commented Aug 28, 2023

Codecov Report

Merging #5196 (97fafac) into master (7eac400) will not change coverage.
The diff coverage is 0.00%.

@@          Coverage Diff           @@
##           master   #5196   +/-   ##
======================================
  Coverage    0.00%   0.00%           
======================================
  Files         588     588           
  Lines       33406   33425   +19     
  Branches     4388    4393    +5     
======================================
- Misses      33406   33425   +19     
Files Coverage Δ
...ala/org/apache/kyuubi/plugin/lineage/Lineage.scala 0.00% <0.00%> (ø)
...n/scala/org/apache/kyuubi/engine/ProcBuilder.scala 0.00% <0.00%> (ø)
...ache/kyuubi/engine/spark/SparkProcessBuilder.scala 0.00% <0.00%> (ø)

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@bowenliang123 bowenliang123 force-pushed the scala213-test branch 2 times, most recently from daa6742 to 725d1e7 Compare August 29, 2023 00:44
@bowenliang123
Copy link
Contributor Author

bowenliang123 commented Aug 29, 2023

1 remaining blocker issue of Spark Hive connector plugin in CI tests.
https://github.com/apache/kyuubi/actions/runs/6006543017/job/16291238480?pr=5196#step:7:2965

The test case in The Spark hive connector's "DROP NAMESPACE using V1 catalog V1 command: drop non-empty namespace with a non-cascading mode" failed on Spark 3.3/3.4 with Scala2.13, while it passes on Spark 3.2 with Scala2.13, and also on Spark 3.1-3.4 with Scala2.12.

I think this case is to ensure preventing dropping non-empty namespace, both exceptions show the enough evidence for it.

Test log:

- DROP NAMESPACE using V1 catalog V1 command: drop non-empty namespace with a non-cascading mode *** FAILED ***
  Expected exception java.lang.IllegalStateException to be thrown, but org.apache.spark.sql.AnalysisException was thrown (DropNamespaceSuite.scala:73)

Related info from the driver log:

19:18:25.695 ScalaTest-main-running-DropNamespaceV1Suite ERROR RetryingHMSHandler: 
InvalidOperationException(message:Database fakens is not empty. One or more tables exist.)
	at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.drop_database_core(HiveMetaStore.java:1039)
	at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.drop_database(HiveMetaStore.java:1175)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:148)
	at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107)
	at com.sun.proxy.$Proxy64.drop_database(Unknown Source)
	at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.dropDatabase(HiveMetaStoreClient.java:866)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:173)
	at com.sun.proxy.$Proxy65.dropDatabase(Unknown Source)
	at org.apache.hadoop.hive.ql.metadata.Hive.dropDatabase(Hive.java:491)
	at org.apache.spark.sql.hive.client.Shim_v0_12.dropDatabase(HiveShim.scala:584)
	at org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$dropDatabase$1(HiveClientImpl.scala:352)
	at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.scala:18)
	at org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$withHiveState$1(HiveClientImpl.scala:298)

Other related facts:

  1. In Spark, the namespaceNotEmptyError method of org.apache.spark.sql.errors.QueryExecutionErrors class defines the IllegalStateException for non-empty namespace.
    https://github.com/apache/spark/blob/branch-3.3/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala
  def namespaceNotEmptyError(namespace: Array[String]): Throwable = {
    new IllegalStateException(s"Namespace ${namespace.quoted} is not empty")
  }
  1. In Spark, the namespaceNotEmptyError exists in <= Spark 3.3, but is only used in <= Spark 3.2 ( in the classes of V2SessionCatalog class and JDBCTableCatalog).
  2. In Kyuubi Hive Spark connector, it throws IllegalStateException in HiveTableCatalog.dropNamespace when dropping non-exited namespace
    throw new IllegalStateException(s"Namespace ${namespace.quoted} is not empty")

Remaining doubts:

  1. How does a non-exited IllegalStateException of Namespace ... is not empty pass the assertions in original CI tests for Spark 3.3 and 3.4 with Scala2.12 ?
  2. How to adjust the current assertion conditions to pass tests on Spark 3.3 and 3.4 with Scala 2.13?

@bowenliang123
Copy link
Contributor Author

I need a hand in this block issue in Hive Spark Connector, with mixed runtime conditions and various test results. Do you have time to have look at this? @yikf

@bowenliang123
Copy link
Contributor Author

Hi @yikf , thanks for your support for fixing the problem in Hive Spark connector module. Well done, and now all the test pass on all Spark version on Scala 2.13.

Could you consider move the changes in Hive Spark connector module to a separate PR ? As the adaption is more about Spark version rather than about Scala version.

After the offline discussion with @pan3793 , we would like to do

  1. separate PR code changes for Hive Spark connector, and will be merged to 1.8
  2. Scala 2.13 compilation test will be left as it is , with 1.8 and master branch
  3. the CI tests for Scala 2.13 will be enabled on master after cutting branch of 1.8

@bowenliang123
Copy link
Contributor Author

bowenliang123 commented Sep 4, 2023

Hi, all the modules (except Flink and Flink IT) now pass all the CI test on Scala 2.13.

@bowenliang123 bowenliang123 marked this pull request as ready for review September 6, 2023 09:24
@bowenliang123 bowenliang123 changed the title [WIP] Test on Scala 2.13 [TEST] Test on Scala 2.13 Sep 6, 2023
@bowenliang123 bowenliang123 requested a review from yikf September 22, 2023 05:45
@bowenliang123 bowenliang123 self-assigned this Sep 22, 2023
@bowenliang123 bowenliang123 added this to the v1.8.0 milestone Sep 22, 2023
@pan3793
Copy link
Member

pan3793 commented Sep 22, 2023

The PR title/description does not match the change now, please update it before merging

@bowenliang123 bowenliang123 changed the title [TEST] Test on Scala 2.13 Enable CI Test on Scala 2.13 and decide Spark's engine scala version from Spark core Sep 22, 2023
@bowenliang123 bowenliang123 changed the title Enable CI Test on Scala 2.13 and decide Spark's engine scala version from Spark core Enable CI Test on Scala 2.13 and decide Spark's engine Scala version from Spark core Sep 22, 2023
@bowenliang123 bowenliang123 changed the title Enable CI Test on Scala 2.13 and decide Spark's engine Scala version from Spark core Enable CI Test on Scala 2.13 and decide Spark's engine Scala version from spark-core jar Sep 22, 2023
@bowenliang123 bowenliang123 changed the title Enable CI Test on Scala 2.13 and decide Spark's engine Scala version from spark-core jar Enable CI Test on Scala 2.13 and extract Spark's engine Scala version from spark-core jar Sep 22, 2023
@bowenliang123 bowenliang123 force-pushed the scala213-test branch 2 times, most recently from 26c0658 to 8d0c22d Compare September 22, 2023 18:05
@bowenliang123 bowenliang123 changed the title Enable CI Test on Scala 2.13 and extract Spark's engine Scala version from spark-core jar Enable CI Test on Scala 2.13 and support custom or spark-core extracted Scala version for Spark's engine Sep 22, 2023
@bowenliang123
Copy link
Contributor Author

bowenliang123 commented Sep 23, 2023

Could you have a second look at this PR ? @pan3793
The extracted method was replaced by the regex in SparkProcessBuilder. Tests were added for both regexes.
And the title and descriptions are updated.

Copy link
Member

@pan3793 pan3793 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, with one minor comments

bowenliang123 added a commit that referenced this pull request Oct 10, 2023
…rk-core extracted Scala version for Spark's engine

### _Why are the changes needed?_

- enable CI test on Scala-2.13 for all modules except Flink SQL engine
- For testing, choose available Spark engine home in `download` module by `SCALA_COMPILE_VERSION` of Kyuubi server
- Choose the Scala version of Spark engine main resource Jar  in the following order:
  1. `SPARK_SCALA_VERSION` system env
  2. Extract Scala version from Spark home's `spark-core` jar filename
- Fixed 1 assertion error of kyuubi-spark-lineage module, as Spark on Scala 2.12 and 2.13 show different order of column linage output in `MergeIntoTable` ut
```
SparkSQLLineageParserHelperSuite:
- columns lineage extract - MergeIntoTable *** FAILED ***
  inputTables(List(v2_catalog.db.source_t))
  outputTables(List(v2_catalog.db.target_t))
  columnLineage(List(ColumnLineage(v2_catalog.db.target_t.name,Set(v2_catalog.db.source_t.name)), ColumnLineage(v2_catalog.db.target_t.price,Set(v2_catalog.db.source_t.price)), ColumnLineage(v2_catalog.db.target_t.id,Set(v2_catalog.db.source_t.id)))) did not equal inputTables(List(v2_catalog.db.source_t))
  outputTables(List(v2_catalog.db.target_t))
  columnLineage(List(ColumnLineage(v2_catalog.db.target_t.id,Set(v2_catalog.db.source_t.id)), ColumnLineage(v2_catalog.db.target_t.name,Set(v2_catalog.db.source_t.name)), ColumnLineage(v2_catalog.db.target_t.price,Set(v2_catalog.db.source_t.price)))) (SparkSQLLineageParserHelperSuite.scala:182)
```
- Fixed other tests relying on Scala scripting results

### _How was this patch tested?_
- [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible

- [ ] Add screenshots for manual tests if appropriate

- [x] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request

### _Was this patch authored or co-authored using generative AI tooling?_

Closes #5196 from bowenliang123/scala213-test.

Closes #5196

97fafac [liangbowen] prevent repeated compilation for regrex pattern
76b99d4 [Bowen Liang] test on scala-2.13

Lead-authored-by: Bowen Liang <[email protected]>
Co-authored-by: liangbowen <[email protected]>
Signed-off-by: Bowen Liang <[email protected]>
(cherry picked from commit e33df9c)
Signed-off-by: Bowen Liang <[email protected]>
@bowenliang123
Copy link
Contributor Author

Thanks for the help and reviewing. This PR is accomplished with the essential support from @pan3793 and @wForget .
Merged to master (1.9.0) and branch-1.8 (1.8.0).

@bowenliang123 bowenliang123 deleted the scala213-test branch October 10, 2023 00:46
turboFei added a commit that referenced this pull request Oct 16, 2023
…respect engine env

### _Why are the changes needed?_

Only extract the spark core scala version if `SPARK_SCALA_VERSION` env is empty, and respect engine env.

### _How was this patch tested?_
- [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible

- [ ] Add screenshots for manual tests if appropriate

- [x] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request

### _Was this patch authored or co-authored using generative AI tooling?_

No.

Closes #5434 from turboFei/lazy_scala_version.

Closes #5196

fdccef7 [fwang12] lazy extract spark core scala version

Authored-by: fwang12 <[email protected]>
Signed-off-by: fwang12 <[email protected]>
turboFei added a commit that referenced this pull request Oct 16, 2023
…respect engine env

### _Why are the changes needed?_

Only extract the spark core scala version if `SPARK_SCALA_VERSION` env is empty, and respect engine env.

### _How was this patch tested?_
- [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible

- [ ] Add screenshots for manual tests if appropriate

- [x] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request

### _Was this patch authored or co-authored using generative AI tooling?_

No.

Closes #5434 from turboFei/lazy_scala_version.

Closes #5196

fdccef7 [fwang12] lazy extract spark core scala version

Authored-by: fwang12 <[email protected]>
Signed-off-by: fwang12 <[email protected]>
(cherry picked from commit c60f5b7)
Signed-off-by: fwang12 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants