Add a strategy to fall back to Vanilla Spark shuffle manager #1047

lviiii · 2022-07-25T10:18:20Z

What changes were proposed in this pull request?

Add the strategy to fallback to Vanilla Spark shuffle manager.
o Enable fallback shuffle configuration and reuse the ColumnarShuffleExchangeExec
o Initiate the splitter iterator in Shuffle Dependency, and transform to the RDD: Produce2[Int, ColumnarBatch]
o Serialize the record batch to Shuffle Writer of Vanilla Spark.

How does this patch work?

When submit an application, we use native SQL engine with default ColumnarShuffleManager configuration,
--conf spark.shuffle.manager=org.apache.spark.shuffle.sort.ColumnarShuffleManager

However, we want to specify the custom or other shuffle manager for some situations, to enable Vanilla Spark shuffle manager，
--conf spark.shuffle.manager=org.apache.spark.shuffle.sort.SortShuffleManager --conf spark.oap.sql.columnar.enableFallbackShuffle=true

How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)

(If this patch involves UI changes, please attach a screenshot; otherwise, remove this)

…ect#978) * [NSE-927] Add macro __AVX512BW__ check for different CPU architecture (oap-project#975) * Add __AVX512BW__ check * Fix cFormat * [NSE-126] set default codegen opt to O1 for branch-1.4

github-actions · 2022-07-25T10:18:37Z

Thanks for opening a pull request!

Could you open an issue for this pull request on Github Issues?

https://github.com/oap-project/native-sql-engine/issues

Then could you also rename commit message and pull request title in the following format?

[NSE-${ISSUES_ID}] ${detailed message}

See also:

Other pull requests

PHILO-HE · 2022-07-26T07:51:26Z

Assuming the first two commits are not relevant to your patch, please do NOT include them. If your work depends on these commits, it would be better to open a dedicate PR to port them to main branch.

# Conflicts: # native-sql-engine/core/src/main/scala/com/intel/oap/expression/ConverterUtils.scala

zhouyuan · 2022-07-28T04:21:01Z

native-sql-engine/core/src/main/scala/com/intel/oap/GazellePluginConfig.scala

@@ -83,6 +83,11 @@ class GazellePluginConfig(conf: SQLConf) extends Logging {
  val enableColumnarShuffledHashJoin: Boolean =
    conf.getConfString("spark.oap.sql.columnar.shuffledhashjoin", "true").toBoolean && enableCpu

+  // enable or disable fallback shuffle manager
+  val enableFallbackShuffle: Boolean = conf


can you please also add a short note on how to use this feature? and also make this default to false

Added that in the description dialog

Disabled the configuration "spark.oap.sql.columnar.enableFallbackShuffle".

Hong and others added 3 commits June 16, 2022 17:06

[NSE-913] Add support for Hadoop 3.3.1 (oap-project#966)

e82c97a

[NSE-927][NSE-126] BackPort PR#975 and PR#977 to branch-1.4 (oap-proj…

256b637

…ect#978) * [NSE-927] Add macro __AVX512BW__ check for different CPU architecture (oap-project#975) * Add __AVX512BW__ check * Fix cFormat * [NSE-126] set default codegen opt to O1 for branch-1.4

Add the strategy to fallback to Vanilla Spark shuffle manager.

11dcf98

lviiii added 3 commits July 25, 2022 11:20

Merge remote-tracking branch 'origin/branch-1.4' into fallback-shuffle

a43be7f

Fix the problem "wrong data".

b96bc62

Merge remote-tracking branch 'gazelle/branch-1.4' into fallback-shuffle

d4ae565

PHILO-HE changed the title ~~Add the strategy to fallback to Vanilla Spark shuffle manager.~~ Add a strategy to fall back to Vanilla Spark shuffle manager Jul 26, 2022

Merge branch 'main' into fallback-shuffle

d371925

# Conflicts: # native-sql-engine/core/src/main/scala/com/intel/oap/expression/ConverterUtils.scala

zhouyuan reviewed Jul 28, 2022

View reviewed changes

lviiii added 8 commits July 28, 2022 14:01

Update GazellePluginConfig.scala

7c51696

Disabled the configuration "spark.oap.sql.columnar.enableFallbackShuffle".

Fix the oom issue.

ca9e995

Remove the exception.

732eeab

Remove the exception.

4df13d9

test.

387620f

test.

4084905

test.

05ff93c

test.

5b517c8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a strategy to fall back to Vanilla Spark shuffle manager #1047

Add a strategy to fall back to Vanilla Spark shuffle manager #1047

lviiii commented Jul 25, 2022 •

edited

Loading

github-actions bot commented Jul 25, 2022

PHILO-HE commented Jul 26, 2022

zhouyuan Jul 28, 2022

lviiii Jul 28, 2022

Add a strategy to fall back to Vanilla Spark shuffle manager #1047

Are you sure you want to change the base?

Add a strategy to fall back to Vanilla Spark shuffle manager #1047

Conversation

lviiii commented Jul 25, 2022 • edited Loading

What changes were proposed in this pull request?

How does this patch work?

How was this patch tested?

github-actions bot commented Jul 25, 2022

PHILO-HE commented Jul 26, 2022

zhouyuan Jul 28, 2022

Choose a reason for hiding this comment

lviiii Jul 28, 2022

Choose a reason for hiding this comment

lviiii commented Jul 25, 2022 •

edited

Loading