Remove dependence on GEOPYSPARK_JARS_PATH env var #688

jpolchlo · 2018-10-17T15:42:43Z

At present, GPS relies on an environment var to tell it how to load the jar resources. This is unnecessary and prevents the loading of jar resources off maven or some other repo. This should be abandoned in favor of either using the --jars or --packages switch to pyspark, and let spark manage the dependencies on its own, according to the user's preferences. This would remove the need to manage an S3 repository of jars, and remove some fiddly code from the package init.

Connects #672
Connects #669

The text was updated successfully, but these errors were encountered:

jpolchlo · 2018-10-17T15:46:37Z

Worth mentioning that this change would not prevent the usage of a fat jar (possibly still published on S3), but would simply mean that there would be some flexibility for the user to choose a fat jar (--jars switch), or a published version (--packages switch).

[In the latter case, the jai_core maven repo problems would require manually downloading that jar from a known good location using a --files switch, followed by an --exclude-packages javax.media:jai_core to make it work. But it does work.]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove dependence on GEOPYSPARK_JARS_PATH env var #688

Remove dependence on GEOPYSPARK_JARS_PATH env var #688

jpolchlo commented Oct 17, 2018

jpolchlo commented Oct 17, 2018

Remove dependence on GEOPYSPARK_JARS_PATH env var #688

Remove dependence on GEOPYSPARK_JARS_PATH env var #688

Comments

jpolchlo commented Oct 17, 2018

jpolchlo commented Oct 17, 2018