Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot run Python PortableRunner -> SparkRunner on EMR cluster #20568

Open
damccorm opened this issue Jun 4, 2022 · 1 comment
Open

Cannot run Python PortableRunner -> SparkRunner on EMR cluster #20568

damccorm opened this issue Jun 4, 2022 · 1 comment

Comments

@damccorm
Copy link
Contributor

damccorm commented Jun 4, 2022

I have been trying to run the python word-count example on an AWS EMR cluster. And it does not work.

Things I have tried:

  • Running with 

python3 py_codes/word_count_beam.py --output word_count_output --runner=SparkRunner

This results in implicitly running with --spark-master-url local[4] which defeats the purpose of running it in a cluster

  • Tried

python3 py_codes/word_count_beam.py --output word_count_output --runner=SparkRunner --spark-master-url=yarn

Still uses local master.


python3 py_codes/word_ount_beam.py --output word_count_output --runner=SparkRunner --output_executable_path=jars/beam_word_count.jar

as described in https://issues.apache.org/jira/browse/BEAM-8970
It can create a jar file, but when I submit the jar with spark-submit I get docker permission denied exception. Possibly related to https://issues.apache.org/jira/browse/BEAM-6020

So, no way to run a python beam code in a yarn spark cluster?
This also means no way to run TFX code (which uses beam) in a yarn cluster.

Imported from Jira BEAM-11378. Original Jira may contain additional context.
Reported by: ratulray.

@kennknowles kennknowles changed the title Cannot run Python PortableRunner on EMR cluster Cannot run Python PortableRunner -> SparkRunner on EMR cluster Dec 2, 2022
@moradology
Copy link

It appears to me that the PROCESS SDK environment avoids the problems encountered here. A discussion of what ended up working for me on this issue from a downstream project: pangeo-forge/pangeo-forge-runner#133

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants