Permission issue in text table join leads to failure #747

merrily01 · 2025-01-07T13:00:47Z

Describe the bug

When using Spark 3.5 with Blaze, performing a join operation on two tables stored in text format,

a permission exception occurs during the shuffle write phase of the executor's task execution due to account issues.

This ultimately leads to the failure of the query execution.

To Reproduce
Steps to reproduce the behavior :

Create test tables with text storage format:

CREATE TABLE IF NOT EXISTS test_table5 (
  id INT,
  key INT,
  value1 STRING
) STORED AS TEXTFILE;

CREATE TABLE IF NOT EXISTS test_table6 (
  id INT,
  key INT,
  value2 STRING
) STORED AS TEXTFILE;

Generate test data :

INSERT INTO test_table5 VALUES
(1, 100, 'value1'),
(2, 200, 'value2'),
(3, 300, 'value3');

INSERT INTO test_table6 VALUES
(1, 400, 'value4'),
(2, 500, 'value5'),
(3, 600, 'value6');

Execute the test case SQL :

SELECT a.*, b.*
FROM test_table5 a
JOIN test_table6 b
ON a.id = b.id;

See error :

Driver side :

Driver stacktrace :
25/01/07 20:41:56 INFO DAGScheduler: ShuffleMapStage 1 (main at NativeMethodAccessorImpl.java:0) failed in 20.353 s due to Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 7) (tjtx16-110-38.58os.org executor 8): java.lang.RuntimeException: poll record batch error: Execution error: native execution panics: Execution error: Execution error: output_with_sender[Shuffle] error: Execution error: output_with_sender[FFIReader] error: Execution error: output_with_sender[FFIReader]: output() returns error: External error: Java exception thrown at native-engine/datafusion-ext-plans/src/ffi_reader_exec.rs:157: java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
	at org.apache.spark.sql.blaze.JniBridge.nextBatch(Native Method)
	at org.apache.spark.sql.blaze.BlazeCallNativeWrapper$$anon$1.hasNext(BlazeCallNativeWrapper.scala:80)
	at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:31)
	at scala.collection.Iterator.foreach(Iterator.scala:943)
	at scala.collection.Iterator.foreach$(Iterator.scala:943)
	at org.apache.spark.util.CompletionIterator.foreach(CompletionIterator.scala:25)
	at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62)
	at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53)
	at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:105)
	at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:49)
	at scala.collection.TraversableOnce.to(TraversableOnce.scala:366)
	at scala.collection.TraversableOnce.to$(TraversableOnce.scala:364)
	at org.apache.spark.util.CompletionIterator.to(CompletionIterator.scala:25)
	at scala.collection.TraversableOnce.toBuffer(TraversableOnce.scala:358)
	at scala.collection.TraversableOnce.toBuffer$(TraversableOnce.scala:358)
	at org.apache.spark.util.CompletionIterator.toBuffer(CompletionIterator.scala:25)
	at scala.collection.TraversableOnce.toArray(TraversableOnce.scala:345)
	at scala.collection.TraversableOnce.toArray$(TraversableOnce.scala:339)
	at org.apache.spark.util.CompletionIterator.toArray(CompletionIterator.scala:25)
	at org.apache.spark.sql.execution.blaze.shuffle.BlazeShuffleWriterBase.nativeShuffleWrite(BlazeShuffleWriterBase.scala:81)
	at org.apache.spark.sql.execution.blaze.plan.NativeShuffleExchangeExec$$anon$1.write(NativeShuffleExchangeExec.scala:159)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:106)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:54)
	at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:166)
	at org.apache.spark.scheduler.Task.run(Task.scala:143)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$5(Executor.scala:662)
	at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
	at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:95)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:682)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)

Executor side :

Exception in thread "Thread-8" java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
	at org.apache.hadoop.mapred.lib.CombineFileRecordReader.initNextRecordReader(CombineFileRecordReader.java:150)
	at org.apache.hadoop.mapred.lib.CombineFileRecordReader.next(CombineFileRecordReader.java:59)
	at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:355)
	at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:265)
	at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)
	at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
	at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
	at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
	at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.processNext(Unknown Source)
	at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
	at org.apache.spark.sql.execution.WholeStageCodegenEvaluatorFactory$WholeStageCodegenPartitionEvaluator$$anon$1.hasNext(WholeStageCodegenEvaluatorFactory.scala:43)
	at org.apache.spark.sql.execution.blaze.arrowio.ArrowFFIExporter.$anonfun$exportNextBatch$3(ArrowFFIExporter.scala:67)
	at org.apache.spark.sql.execution.blaze.arrowio.ArrowFFIExporter.$anonfun$exportNextBatch$3$adapted(ArrowFFIExporter.scala:63)
	at org.apache.spark.sql.blaze.util.Using$.$anonfun$resources$2(Using.scala:313)
	at org.apache.spark.sql.blaze.util.Using$.resource(Using.scala:272)
	at org.apache.spark.sql.blaze.util.Using$.$anonfun$resources$1(Using.scala:312)
	at org.apache.spark.sql.blaze.util.Using$.resource(Using.scala:272)
	at org.apache.spark.sql.blaze.util.Using$.resources(Using.scala:311)
	at org.apache.spark.sql.execution.blaze.arrowio.ArrowFFIExporter.$anonfun$exportNextBatch$1(ArrowFFIExporter.scala:63)
	at org.apache.spark.sql.execution.blaze.arrowio.ArrowFFIExporter.$anonfun$exportNextBatch$1$adapted(ArrowFFIExporter.scala:60)
	at org.apache.spark.sql.blaze.util.Using$.resource(Using.scala:272)
	at org.apache.spark.sql.execution.blaze.arrowio.ArrowFFIExporter.exportNextBatch(ArrowFFIExporter.scala:60)
Caused by: java.lang.reflect.InvocationTargetException
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
	at org.apache.hadoop.mapred.lib.CombineFileRecordReader.initNextRecordReader(CombineFileRecordReader.java:142)
	... 22 more
Caused by: org.apache.hadoop.security.AccessControlException: Permission denied: user=yarn, access=READ, inode="/home/hive/warehouse/test.db/test_table6/part-00001-d58e0437-f279-48fa-933a-63ce19829afd-c000.lzo":hdfs:hadoop:-rwx-wx-wx
	at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:400)
	at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:261)
	at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:193)
	at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1993)
	at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1977)
	at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPathAccess(FSDirectory.java:1919)
	at org.apache.hadoop.hdfs.server.namenode.FSDirStatAndListingOp.getBlockLocations(FSDirStatAndListingOp.java:162)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:2061)
	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:833)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:288)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:528)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1072)
	at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1073)
	at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:994)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:3085)

Expected behavior

With Spark 3.5 + Blaze :
- The query execution fails with an error message and logs.
With Vanilla Spark 3.5 or set spark.blaze.enable=false:
- The query executes successfully, returning the expected result data.

Screenshots

With Spark 3.5 + Blaze:
With Vanilla Spark 3.5 or set spark.blaze.enable=false :

Additional Context :

The issue occurs when master is set to YARN, but not when running in local mode.
Tables of parquet type do not have this problem.

The text was updated successfully, but these errors were encountered:

merrily01 mentioned this issue Jan 7, 2025

[BLAZE-747] Enhance the ArrowFFIExporter.exportNextBatch method to execute conditionally #748

Merged

richox closed this as completed in #748 Jan 9, 2025

Flyangz mentioned this issue Jan 16, 2025

[BLAZE-747][FOLLOW-UP] Fix user changed in FFI NextBatch #769

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Permission issue in text table join leads to failure #747

Permission issue in text table join leads to failure #747

merrily01 commented Jan 7, 2025 •

edited

Loading

Permission issue in text table join leads to failure #747

Permission issue in text table join leads to failure #747

Comments

merrily01 commented Jan 7, 2025 • edited Loading

merrily01 commented Jan 7, 2025 •

edited

Loading