Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

job failed with multiple errors #673

Open
njalan opened this issue Nov 29, 2024 · 3 comments
Open

job failed with multiple errors #673

njalan opened this issue Nov 29, 2024 · 3 comments
Labels

Comments

@njalan
Copy link

njalan commented Nov 29, 2024

I tested with 4 jobs and 3 are failed due to below errors:

1:
due to org.apache.spark.SparkException: Job aborted due to stage failure: Task 3 in stage 150.0 failed 4 times, most recent failure: Lost task 3.3 in stage 150.0 (TID 10770) (10.120.102.173 executor 11): java.lang.RuntimeException: called Result::unwrap() on an Err value: Execution("cannot create execution plan: DataFusionError(ArrowError(SchemaError("Unable to get field named \"#4303\". Valid fields: [\"#2922\", \"#2923\", \"#2924\", \"#2925\", \"#2926\", \"#2927\", \"#2928\", \"#2929\", \"#4286\", \"#4284\"]"), None))")
at org.apache.spark.sql.blaze.JniBridge.callNative(Native Method)
at org.apache.spark.sql.blaze.BlazeCallNativeWrapper.(BlazeCallNativeWrapper.scala:66)
at org.apache.spark.sql.blaze.NativeHelper$.executeNativePlan(NativeHelper.scala:89)
at org.apache.spark.sql.execution.blaze.shuffle.BlazeShuffleWriterBase.nativeShuffleWrite(BlazeShuffleWriterBase.scala:80)
at org.apache.spark.sql.execution.blaze.plan.NativeShuffleExchangeExec$$anon$1.write(NativeShuffleExchangeExec.scala:158)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52)
at org.apache.spark.scheduler.Task.run(Task.scala:136)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:548)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1504)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:551)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

java.lang.RuntimeException: [partition=10] panics: Execution error: Execution error: output_with_sender[CoalesceStream] error: Execution error: output_with_sender[Shuffle] error: Execution error: output_with_sender[CoalesceStream] error: Execution error: output_with_sender[CoalesceStream]: output() returns error: Execution error: Execution error: output_with_sender[Project] error: Execution error: output_with_sender[Project]: output() returns error: External error: Java exception thrown at native-engine/datafusion-ext-exprs/src/spark_udf_wrapper.rs:92: java.lang.ClassCastException: cannot assign instance of java.lang.invoke.SerializedLambda to field org.apache.spark.sql.catalyst.expressions.ScalaUDF.f of type scala.Function1 in instance of org.apache.spark.sql.catalyst.expressions.ScalaUDF

  1. due to org.apache.spark.SparkException: Job aborted due to stage failure: Task 28 in stage 109.0 failed 4 times, most recent failure: Lost task 28.3 in stage 109.0 (TID 10846) (10.120.102.175 executor 7): java.lang.RuntimeException: [partition=28] poll record batch error: Execution error: [partition=28] native execution panics: Execution error: Execution error: output_with_sender[CoalesceStream] error: Execution error: output_with_sender[Shuffle] error: Execution error: output_with_sender[CoalesceStream] error: Execution error: output_with_sender[CoalesceStream] error: Execution error: output_with_sender[Agg] error: Execution error: output_with_sender[CoalesceStream] error: Execution error: output_with_sender[CoalesceStream]: output() returns error: Execution error: Execution error: output_with_sender[Project] error: Execution error: output_with_sender[RenameColumns] error: Execution error: output_with_sender[ParquetScan] error: Execution error: assertion left == right failed
    left: 8
    right: 12
    at org.apache.spark.sql.blaze.JniBridge.nextBatch(Native Method)
    at org.apache.spark.sql.blaze.BlazeCallNativeWrapper$$anon$1.hasNext(BlazeCallNativeWrapper.scala:80)
    at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:31)
    at scala.collection.Iterator.foreach(Iterator.scala:943)
    at scala.collection.Iterator.foreach$(Iterator.scala:943)
    at org.apache.spark.util.CompletionIterator.foreach(CompletionIterator.scala:25)
    at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62)
    at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53)
    at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:105)
    at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:49)
    at scala.collection.TraversableOnce.to(TraversableOnce.scala:366)
    at scala.collection.TraversableOnce.to$(TraversableOnce.scala:364)
    at org.apache.spark.util.CompletionIterator.to(CompletionIterator.scala:25)
    at scala.collection.TraversableOnce.toBuffer(TraversableOnce.scala:358)
    at scala.collection.TraversableOnce.toBuffer$(TraversableOnce.scala:358)
    at org.apache.spark.util.CompletionIterator.toBuffer(CompletionIterator.scala:25)
    at scala.collection.TraversableOnce.toBuffer(TraversableOnce.scala:358)
    at scala.collection.TraversableOnce.toBuffer$(TraversableOnce.scala:358)
    at org.apache.spark.util.CompletionIterator.toBuffer(CompletionIterator.scala:25)
    at scala.collection.TraversableOnce.toArray(TraversableOnce.scala:345)
    at scala.collection.TraversableOnce.toArray$(TraversableOnce.scala:339)
    at org.apache.spark.util.CompletionIterator.toArray(CompletionIterator.scala:25)
    at org.apache.spark.sql.execution.blaze.shuffle.BlazeShuffleWriterBase.nativeShuffleWrite(BlazeShuffleWriterBase.scala:81)
    at org.apache.spark.sql.execution.blaze.plan.NativeShuffleExchangeExec$$anon$1.write(NativeShuffleExchangeExec.scala:158)
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
    at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52)
    at org.apache.spark.scheduler.Task.run(Task.scala:136)
    at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:548)
    at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1504)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:551)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)

version:Spark 3.3.2 hudi 0.13.1

@njalan
Copy link
Author

njalan commented Dec 3, 2024

Regarding assertion left == right failed I think below are the similar issue
salsa-rs/salsa#536

@njalan
Copy link
Author

njalan commented Dec 20, 2024

目前发现的特点就是在一个很多left join的query,每个left join也是一个子查询。如果一个表多次出现在这些子查询中。那么就会报类似下面的错误
called Result::unwrap() on an Err value: Execution("cannot create execution plan: DataFusionError(ArrowError(SchemaError("Unable to get field named \"#151\". Valid fields: [\"#50\", \"#51\", \"#59\", \"#61\", \"#218\"]"), None))")

下面是部分是表xxxxxxxx相关的执行计划

(27) NativeParquetScan xxxxxxxx
Output [4]: [uid#50, aname#51, dname#59, tle#61]
Arguments: FileScan parquet xxxxxxxx[uid#50,aname#51,dname#59,tle#61] Batched: false, DataFilters: [isnotnull(uid#50)]

(28) InputAdapter
Input [4]: [uid#50, aname#51, dname#59, tle#61]
Arguments: [#50, #51, #59, #61]

(29) NativeFilter
Input [4]: [#50#50, #51#51, #59#59, #61#61]
Arguments: isnotnull(uid#50)

(30) NativeBroadcastExchange
Input [4]: [#50#50, #51#51, #59#59, #61#61]
Arguments: HashedRelationBroadcastMode(List(cast(input[0, int, false] as bigint)),false), [plan_id=381]

(31) NativeBroadcastJoin
Left keys [1]: [createbyuid#123]
Right keys [1]: [uid#50]
Join condition: None

另外一个子查询相关的执行计划

(45) ReusedExchange [Reuses operator id: 30]
Output [4]: [#151#151, #152#152, #160#160, #162#162]. --生成新的id

(46) NativeBroadcastJoin
Left keys [1]: [createbyuid#140]
Right keys [1]: [uid#151] --uid 变成151
Join condition: None

Copy link

This issue is stale because it has been open for 30 days with no activity.

@github-actions github-actions bot added the stale label Jan 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant