Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

S3 write failure with alluxio 2.9.4 release #18643

Open
pragnesh opened this issue Jul 2, 2024 · 0 comments
Open

S3 write failure with alluxio 2.9.4 release #18643

pragnesh opened this issue Jul 2, 2024 · 0 comments
Labels
type-bug This issue is about a bug

Comments

@pragnesh
Copy link
Contributor

pragnesh commented Jul 2, 2024

Alluxio Version:
2.9.4

Describe the bug
After upgrading alluxio from 2.9.3 to 2.9.4, we are seeing following exception when spark job write output to dir which is mounted on s3 as UFS, multiple job running which is writing to same location.

2024-06-28 14:15:33,548 ERROR [task-execution-service-2](S3AOutputStream.java:156) - Failed to upload s3path/y=2024/mo=06/d=28/h=13/part-00008-df7dc7d1-cde0-4ba9-98b4-e6563e37ee59.c000.snappy.parquet
java.util.concurrent.RejectedExecutionException: Task java.util.concurrent.FutureTask@22e014ef rejected from java.util.concurrent.ThreadPoolExecutor@722238d8[Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 0]
	at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2063)
	at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:830)
	at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1379)
	at java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:134)
	at com.amazonaws.services.s3.transfer.internal.UploadMonitor.create(UploadMonitor.java:95)
	at com.amazonaws.services.s3.transfer.TransferManager.doUpload(TransferManager.java:685)
	at com.amazonaws.services.s3.transfer.TransferManager.upload(TransferManager.java:534)
	at alluxio.underfs.s3a.S3AOutputStream.close(S3AOutputStream.java:154)
	at com.google.common.io.Closer.close(Closer.java:218)
	at alluxio.job.plan.persist.PersistDefinition.runTask(PersistDefinition.java:183)
	at alluxio.job.plan.persist.PersistDefinition.runTask(PersistDefinition.java:57)
	at alluxio.worker.job.task.TaskExecutor.run(TaskExecutor.java:88)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:750)
2024-06-28 14:15:33,553 INFO  [task-execution-service-2](TaskExecutorManager.java:204) - Task 0 for job 1719583916388 failed:

To Reproduce
spark job running with 2.9.4 using s3 as UFS at

Expected behavior
alluxio should be able to upload file to s3, instead it failed and throw exception.

Urgency
Unable to upgrade to 2.9.4

Are you planning to fix it
Not working on it, But i am willing to help with PR if someone point me to some direction.

Additional context
Add any other context about the problem here.
Other jobs running which is also writing to same location. So s3 metadata update could be issue.
I am also seeing alluxio.proxy.s3.bucketpathcache.timeout property default vaule changed from 1 min to 0 min in 2.9.4 release.

@pragnesh pragnesh added the type-bug This issue is about a bug label Jul 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type-bug This issue is about a bug
Projects
None yet
Development

No branches or pull requests

1 participant