Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove dependence on GEOPYSPARK_JARS_PATH env var #688

Open
jpolchlo opened this issue Oct 17, 2018 · 1 comment
Open

Remove dependence on GEOPYSPARK_JARS_PATH env var #688

jpolchlo opened this issue Oct 17, 2018 · 1 comment

Comments

@jpolchlo
Copy link
Collaborator

At present, GPS relies on an environment var to tell it how to load the jar resources. This is unnecessary and prevents the loading of jar resources off maven or some other repo. This should be abandoned in favor of either using the --jars or --packages switch to pyspark, and let spark manage the dependencies on its own, according to the user's preferences. This would remove the need to manage an S3 repository of jars, and remove some fiddly code from the package init.

Connects #672
Connects #669

@jpolchlo
Copy link
Collaborator Author

Worth mentioning that this change would not prevent the usage of a fat jar (possibly still published on S3), but would simply mean that there would be some flexibility for the user to choose a fat jar (--jars switch), or a published version (--packages switch).

[In the latter case, the jai_core maven repo problems would require manually downloading that jar from a known good location using a --files switch, followed by an --exclude-packages javax.media:jai_core to make it work. But it does work.]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant