You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
At present, GPS relies on an environment var to tell it how to load the jar resources. This is unnecessary and prevents the loading of jar resources off maven or some other repo. This should be abandoned in favor of either using the --jars or --packages switch to pyspark, and let spark manage the dependencies on its own, according to the user's preferences. This would remove the need to manage an S3 repository of jars, and remove some fiddly code from the package init.
Worth mentioning that this change would not prevent the usage of a fat jar (possibly still published on S3), but would simply mean that there would be some flexibility for the user to choose a fat jar (--jars switch), or a published version (--packages switch).
[In the latter case, the jai_core maven repo problems would require manually downloading that jar from a known good location using a --files switch, followed by an --exclude-packages javax.media:jai_core to make it work. But it does work.]
At present, GPS relies on an environment var to tell it how to load the jar resources. This is unnecessary and prevents the loading of jar resources off maven or some other repo. This should be abandoned in favor of either using the
--jars
or--packages
switch to pyspark, and let spark manage the dependencies on its own, according to the user's preferences. This would remove the need to manage an S3 repository of jars, and remove some fiddly code from the package init.Connects #672
Connects #669
The text was updated successfully, but these errors were encountered: