Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Permission issue in text table join leads to failure #747

Closed
merrily01 opened this issue Jan 7, 2025 · 0 comments · Fixed by #748 or #769
Closed

Permission issue in text table join leads to failure #747

merrily01 opened this issue Jan 7, 2025 · 0 comments · Fixed by #748 or #769

Comments

@merrily01
Copy link
Contributor

merrily01 commented Jan 7, 2025

Describe the bug

When using Spark 3.5 with Blaze, performing a join operation on two tables stored in text format,

a permission exception occurs during the shuffle write phase of the executor's task execution due to account issues.

This ultimately leads to the failure of the query execution.

To Reproduce
Steps to reproduce the behavior :

  1. Create test tables with text storage format:
CREATE TABLE IF NOT EXISTS test_table5 (
  id INT,
  key INT,
  value1 STRING
) STORED AS TEXTFILE;

CREATE TABLE IF NOT EXISTS test_table6 (
  id INT,
  key INT,
  value2 STRING
) STORED AS TEXTFILE;
  1. Generate test data :
INSERT INTO test_table5 VALUES
(1, 100, 'value1'),
(2, 200, 'value2'),
(3, 300, 'value3');

INSERT INTO test_table6 VALUES
(1, 400, 'value4'),
(2, 500, 'value5'),
(3, 600, 'value6');
  1. Execute the test case SQL :
SELECT a.*, b.*
FROM test_table5 a
JOIN test_table6 b
ON a.id = b.id;
  1. See error :
  • Driver side :
Driver stacktrace :
25/01/07 20:41:56 INFO DAGScheduler: ShuffleMapStage 1 (main at NativeMethodAccessorImpl.java:0) failed in 20.353 s due to Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 7) (tjtx16-110-38.58os.org executor 8): java.lang.RuntimeException: poll record batch error: Execution error: native execution panics: Execution error: Execution error: output_with_sender[Shuffle] error: Execution error: output_with_sender[FFIReader] error: Execution error: output_with_sender[FFIReader]: output() returns error: External error: Java exception thrown at native-engine/datafusion-ext-plans/src/ffi_reader_exec.rs:157: java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
	at org.apache.spark.sql.blaze.JniBridge.nextBatch(Native Method)
	at org.apache.spark.sql.blaze.BlazeCallNativeWrapper$$anon$1.hasNext(BlazeCallNativeWrapper.scala:80)
	at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:31)
	at scala.collection.Iterator.foreach(Iterator.scala:943)
	at scala.collection.Iterator.foreach$(Iterator.scala:943)
	at org.apache.spark.util.CompletionIterator.foreach(CompletionIterator.scala:25)
	at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62)
	at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53)
	at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:105)
	at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:49)
	at scala.collection.TraversableOnce.to(TraversableOnce.scala:366)
	at scala.collection.TraversableOnce.to$(TraversableOnce.scala:364)
	at org.apache.spark.util.CompletionIterator.to(CompletionIterator.scala:25)
	at scala.collection.TraversableOnce.toBuffer(TraversableOnce.scala:358)
	at scala.collection.TraversableOnce.toBuffer$(TraversableOnce.scala:358)
	at org.apache.spark.util.CompletionIterator.toBuffer(CompletionIterator.scala:25)
	at scala.collection.TraversableOnce.toArray(TraversableOnce.scala:345)
	at scala.collection.TraversableOnce.toArray$(TraversableOnce.scala:339)
	at org.apache.spark.util.CompletionIterator.toArray(CompletionIterator.scala:25)
	at org.apache.spark.sql.execution.blaze.shuffle.BlazeShuffleWriterBase.nativeShuffleWrite(BlazeShuffleWriterBase.scala:81)
	at org.apache.spark.sql.execution.blaze.plan.NativeShuffleExchangeExec$$anon$1.write(NativeShuffleExchangeExec.scala:159)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:106)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:54)
	at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:166)
	at org.apache.spark.scheduler.Task.run(Task.scala:143)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$5(Executor.scala:662)
	at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
	at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:95)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:682)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
  • Executor side :
Exception in thread "Thread-8" java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
	at org.apache.hadoop.mapred.lib.CombineFileRecordReader.initNextRecordReader(CombineFileRecordReader.java:150)
	at org.apache.hadoop.mapred.lib.CombineFileRecordReader.next(CombineFileRecordReader.java:59)
	at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:355)
	at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:265)
	at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)
	at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
	at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
	at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
	at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.processNext(Unknown Source)
	at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
	at org.apache.spark.sql.execution.WholeStageCodegenEvaluatorFactory$WholeStageCodegenPartitionEvaluator$$anon$1.hasNext(WholeStageCodegenEvaluatorFactory.scala:43)
	at org.apache.spark.sql.execution.blaze.arrowio.ArrowFFIExporter.$anonfun$exportNextBatch$3(ArrowFFIExporter.scala:67)
	at org.apache.spark.sql.execution.blaze.arrowio.ArrowFFIExporter.$anonfun$exportNextBatch$3$adapted(ArrowFFIExporter.scala:63)
	at org.apache.spark.sql.blaze.util.Using$.$anonfun$resources$2(Using.scala:313)
	at org.apache.spark.sql.blaze.util.Using$.resource(Using.scala:272)
	at org.apache.spark.sql.blaze.util.Using$.$anonfun$resources$1(Using.scala:312)
	at org.apache.spark.sql.blaze.util.Using$.resource(Using.scala:272)
	at org.apache.spark.sql.blaze.util.Using$.resources(Using.scala:311)
	at org.apache.spark.sql.execution.blaze.arrowio.ArrowFFIExporter.$anonfun$exportNextBatch$1(ArrowFFIExporter.scala:63)
	at org.apache.spark.sql.execution.blaze.arrowio.ArrowFFIExporter.$anonfun$exportNextBatch$1$adapted(ArrowFFIExporter.scala:60)
	at org.apache.spark.sql.blaze.util.Using$.resource(Using.scala:272)
	at org.apache.spark.sql.execution.blaze.arrowio.ArrowFFIExporter.exportNextBatch(ArrowFFIExporter.scala:60)
Caused by: java.lang.reflect.InvocationTargetException
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
	at org.apache.hadoop.mapred.lib.CombineFileRecordReader.initNextRecordReader(CombineFileRecordReader.java:142)
	... 22 more
Caused by: org.apache.hadoop.security.AccessControlException: Permission denied: user=yarn, access=READ, inode="/home/hive/warehouse/test.db/test_table6/part-00001-d58e0437-f279-48fa-933a-63ce19829afd-c000.lzo":hdfs:hadoop:-rwx-wx-wx
	at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:400)
	at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:261)
	at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:193)
	at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1993)
	at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1977)
	at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPathAccess(FSDirectory.java:1919)
	at org.apache.hadoop.hdfs.server.namenode.FSDirStatAndListingOp.getBlockLocations(FSDirStatAndListingOp.java:162)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:2061)
	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:833)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:288)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:528)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1072)
	at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1073)
	at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:994)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:3085)

Expected behavior

  • With Spark 3.5 + Blaze :

    • The query execution fails with an error message and logs.
  • With Vanilla Spark 3.5 or set spark.blaze.enable=false:

    • The query executes successfully, returning the expected result data.

Screenshots

  • With Spark 3.5 + Blaze:
    image

  • With Vanilla Spark 3.5 or set spark.blaze.enable=false :
    image

Additional Context :

  • The issue occurs when master is set to YARN, but not when running in local mode.
  • Tables of parquet type do not have this problem.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
1 participant