You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
kennknowles
changed the title
Cannot run Python PortableRunner on EMR cluster
Cannot run Python PortableRunner -> SparkRunner on EMR cluster
Dec 2, 2022
It appears to me that the PROCESS SDK environment avoids the problems encountered here. A discussion of what ended up working for me on this issue from a downstream project: pangeo-forge/pangeo-forge-runner#133
I have been trying to run the python word-count example on an AWS EMR cluster. And it does not work.
Things I have tried:
This results in implicitly running with
--spark-master-url local[4]
which defeats the purpose of running it in a clusterStill uses local master.
Could not use method described in https://beam.apache.org/documentation/runners/spark/ under "Running on a pre-deployed Spark cluster" because in yarn master is not exposed with an URL like localhost:7077
Tried
as described in https://issues.apache.org/jira/browse/BEAM-8970
It can create a jar file, but when I submit the jar with spark-submit I get docker permission denied exception. Possibly related to https://issues.apache.org/jira/browse/BEAM-6020
So, no way to run a python beam code in a yarn spark cluster?
This also means no way to run TFX code (which uses beam) in a yarn cluster.
Imported from Jira BEAM-11378. Original Jira may contain additional context.
Reported by: ratulray.
The text was updated successfully, but these errors were encountered: