Adrien Giget Denis Stojiljkovic Ethan Machavoine Jonathan Poelger Tanguy Malandain
Pour lancer le projet, nous avons suivi les étapes suivantes :
- Setup Master Process:
- Edit the
spark-env.sh
file in theconf
directory of your Spark installation. - Set
SPARK_MASTER_HOST
to the hostname or IP address of the master node. - Start the master node with
./start-master.sh
located in thesbin
directory.
- Edit the
- Setup Worker Nodes:
- Point each worker node to the master by setting the
SPARK_MASTER
environment variable or using the--master
argument with./start-worker.sh
. - Start the worker node using
./start-worker.sh spark://MASTER:7077
. ReplaceMASTER
with the master node's hostname or IP address.
- Point each worker node to the master by setting the
- Application Submission:
- Use the
spark-submit
command to run your PySpark application. - Example:
spark-submit --master spark://MASTER:7077 path_to_your_script.py
- Use the
- Versions:
- Spark: 3.5.0
- Python: 3.10
- Java: 8