You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I was wondering how Maggy knows Question 1: How to contact Spark's driver to make the RPC calls in another Spark infrastructure that is not Hopsworks. Does having a resource manager such as YARN on top of Spark Cluster affect how Maggy should make RPC requests to the Spark driver ? or should it work as normal?
Question 2: If I opt to go for "you can deploy an entire Hopsworks instance to your own AWS account" as also explained here (https://hopsworks.readthedocs.io/en/stable/getting_started/installation_guide/platforms/aws-image.html), a two t2.2xlarge instance type that has 8 vCPUs and 32 GB RAM is a single host and not a cluster. Does the 8 available vCPU equate to the Executors that Spark will use ? Meaning that if I run my trials and would like to experiment the results from different executors I will only have a maximum of 8 available executors, unless if I increase the instance type ?
The reason behind Question 1: is because in this example here (https://github.com/logicalclocks/maggy/blob/master/examples/maggy-ablation-titanic-example.ipynb), a spark session is created but I cannot explicitly see the point where Maggy hands over/submits jobs to Spark. For Instance, In some of the industrial set ups, one would interact with Spark by creating a Spark session then submitting the job like so;
I was wondering how Maggy knows
Question 1: How to contact Spark's driver to make the RPC calls in another Spark infrastructure that is not Hopsworks. Does having a resource manager such as YARN on top of Spark Cluster affect how Maggy should make RPC requests to the Spark driver ? or should it work as normal?
Question 2: If I opt to go for "you can deploy an entire Hopsworks instance to your own AWS account" as also explained here (https://hopsworks.readthedocs.io/en/stable/getting_started/installation_guide/platforms/aws-image.html), a two t2.2xlarge instance type that has 8 vCPUs and 32 GB RAM is a single host and not a cluster. Does the 8 available vCPU equate to the Executors that Spark will use ? Meaning that if I run my trials and would like to experiment the results from different executors I will only have a maximum of 8 available executors, unless if I increase the instance type ?
The reason behind Question 1: is because in this example here (https://github.com/logicalclocks/maggy/blob/master/examples/maggy-ablation-titanic-example.ipynb), a spark session is created but I cannot explicitly see the point where Maggy hands over/submits jobs to Spark. For Instance, In some of the industrial set ups, one would interact with Spark by creating a Spark session then submitting the job like so;
Creating spark session
spark = SparkSession \ .builder \ .appName('spark-ipython') \ .config('spark.shuffle.service.enabled', 'true') \ .config('spark.executor.memory', '2844M') \ .config('spark.dynamicAllocation.enabled', 'true') \ .config('spark.dynamicAllocation.minExecutors', '0') \ .config('spark.dynamicAllocation.maxExecutors', '100') \ .getOrCreate()
Submitting Job to Spark Cluster
spark-submit --master yarn --deploy-mode cluster --archives hdfs:///somelocation/Python.zip#Python --conf spark.yarn.appMasterEnv.PYSPARK_PYTHON=./Python/bin/python3 --conf spark.yarn.appMasterEnv.PYSPARK_DRIVER_PYTHON=./Python/bin/python3 main.py
whereby main.py contains Maggy codeExample of main.py file
`
The text was updated successfully, but these errors were encountered: