提供配置,让用户决定是否优先用本地的hadoop conf #99

hn5092 · 2020-01-20T03:13:15Z

优化前使用环境的
优化后使用的是本机的

hn5092 · 2020-02-27T03:38:22Z

JIRA : https://olapio.atlassian.net/browse/KE-9526

Kyligence#100 submit job error when current user not have admin acl

#99 #102 and upgrade to r26

…constraint ### What changes were proposed in this pull request? This PR add support infer constraints from cast equality constraint. For example: ```scala scala> spark.sql("create table spark_29231_1(c1 bigint, c2 bigint)") res0: org.apache.spark.sql.DataFrame = [] scala> spark.sql("create table spark_29231_2(c1 int, c2 bigint)") res1: org.apache.spark.sql.DataFrame = [] scala> spark.sql("select t1.* from spark_29231_1 t1 join spark_29231_2 t2 on (t1.c1 = t2.c1 and t1.c1 = 1)").explain == Physical Plan == *(2) Project [c1#5L, c2#6L] +- *(2) BroadcastHashJoin [c1#5L], [cast(c1#7 as bigint)], Inner, BuildRight :- *(2) Project [c1#5L, c2#6L] : +- *(2) Filter (isnotnull(c1#5L) AND (c1#5L = 1)) : +- *(2) ColumnarToRow : +- FileScan parquet default.spark_29231_1[c1#5L,c2#6L] Batched: true, DataFilters: [isnotnull(c1#5L), (c1#5L = 1)], Format: Parquet, Location: InMemoryFileIndex[file:/root/spark-3.0.0-preview2-bin-hadoop2.7/spark-warehouse/spark_29231_1], PartitionFilters: [], PushedFilters: [IsNotNull(c1), EqualTo(c1,1)], ReadSchema: struct<c1:bigint,c2:bigint> +- BroadcastExchange HashedRelationBroadcastMode(List(cast(input[0, int, true] as bigint))), [id=#209] +- *(1) Project [c1#7] +- *(1) Filter isnotnull(c1#7) +- *(1) ColumnarToRow +- FileScan parquet default.spark_29231_2[c1#7] Batched: true, DataFilters: [isnotnull(c1#7)], Format: Parquet, Location: InMemoryFileIndex[file:/root/spark-3.0.0-preview2-bin-hadoop2.7/spark-warehouse/spark_29231_2], PartitionFilters: [], PushedFilters: [IsNotNull(c1)], ReadSchema: struct<c1:int> ``` After this PR: ```scala scala> spark.sql("select t1.* from spark_29231_1 t1 join spark_29231_2 t2 on (t1.c1 = t2.c1 and t1.c1 = 1)").explain == Physical Plan == *(2) Project [c1#0L, c2#1L] +- *(2) BroadcastHashJoin [c1#0L], [cast(c1#2 as bigint)], Inner, BuildRight :- *(2) Project [c1#0L, c2#1L] : +- *(2) Filter (isnotnull(c1#0L) AND (c1#0L = 1)) : +- *(2) ColumnarToRow : +- FileScan parquet default.spark_29231_1[c1#0L,c2#1L] Batched: true, DataFilters: [isnotnull(c1#0L), (c1#0L = 1)], Format: Parquet, Location: InMemoryFileIndex[file:/root/opensource/spark/spark-warehouse/spark_29231_1], PartitionFilters: [], PushedFilters: [IsNotNull(c1), EqualTo(c1,1)], ReadSchema: struct<c1:bigint,c2:bigint> +- BroadcastExchange HashedRelationBroadcastMode(List(cast(input[0, int, true] as bigint))), [id=#99] +- *(1) Project [c1#2] +- *(1) Filter ((cast(c1#2 as bigint) = 1) AND isnotnull(c1#2)) +- *(1) ColumnarToRow +- FileScan parquet default.spark_29231_2[c1#2] Batched: true, DataFilters: [(cast(c1#2 as bigint) = 1), isnotnull(c1#2)], Format: Parquet, Location: InMemoryFileIndex[file:/root/opensource/spark/spark-warehouse/spark_29231_2], PartitionFilters: [], PushedFilters: [IsNotNull(c1)], ReadSchema: struct<c1:int> ``` ### Why are the changes needed? Improve query performance. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Unit test. Closes apache#27252 from wangyum/SPARK-29231. Authored-by: Yuming Wang <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>

hn5092 self-assigned this Jan 20, 2020

yugan95 pushed a commit to yugan95/spark that referenced this issue Mar 4, 2020

Kyligence#99 add config to allow user hadoop classpath first

ff459da

Kyligence#100 submit job error when current user not have admin acl

yugan95 pushed a commit to yugan95/spark that referenced this issue Mar 4, 2020

Kyligence#99 fix ut JsonProtocolSuite

904c654

yugan95 added a commit that referenced this issue Mar 5, 2020

Merge pull request #104 from gyustorm/fix_yarn

5a3a8cd

#99 #102 and upgrade to r26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

提供配置,让用户决定是否优先用本地的hadoop conf #99

提供配置,让用户决定是否优先用本地的hadoop conf #99

hn5092 commented Jan 20, 2020

hn5092 commented Feb 27, 2020

提供配置,让用户决定是否优先用本地的hadoop conf #99

提供配置,让用户决定是否优先用本地的hadoop conf #99

Comments

hn5092 commented Jan 20, 2020

hn5092 commented Feb 27, 2020