Spark Setup Guide on AWS EC2

This guide provides step-by-step instructions to set up Apache Spark on AWS EC2 with one master node and one worker node.

Master Node Setup

1. Install Java

Update the system and install Java:

sudo yum update -y
sudo yum install -y java-11-amazon-corretto

2. Download and Install Spark

Download Spark:

wget https://archive.apache.org/dist/spark/spark-3.5.2/spark-3.5.2-bin-hadoop3.tgz

Extract the downloaded file:

tar xvf spark-3.5.2-bin-hadoop3.tgz

Move Spark to the /opt directory:

sudo mv spark-3.5.2-bin-hadoop3 /opt/spark

3. Set Environment Variables

Set the necessary environment variables:

echo "export SPARK_HOME=/opt/spark" >> ~/.bashrc
echo "export PATH=\$PATH:\$SPARK_HOME/bin" >> ~/.bashrc
echo "export JAVA_HOME=/usr/lib/jvm/java-11-amazon-corretto.x86_64" >> ~/.bashrc
source ~/.bashrc

4. Configure Spark (Master Node)

Navigate to Spark configuration directory:

cd $SPARK_HOME/conf

Copy the template for the Spark environment file:

cp spark-env.sh.template spark-env.sh

Edit the spark-env.sh file:

nano spark-env.sh

Add the following line:

export SPARK_MASTER_HOST='<Master-Node-Private-IP>'

5. Start the Spark Master Node

Start the Spark master node:

$SPARK_HOME/sbin/start-master.sh

6. Install AWS CLI

Install the AWS CLI to manage datasets:

sudo yum install awscli -y

7. Create a Directory for Data

Create a directory to store your data:

mkdir -p /home/ec2-user/data

8. Download Data from S3

Download the dataset from Amazon S3:

aws s3 cp s3:put_your_dataset_path_in_s3

for example: aws s3 cp s3://amazon-bucket23/amazon_reviews.csv /home/ec2-user/data

9. Start Spark Shell

Start the interactive Spark shell:

$SPARK_HOME/bin/spark-shell

Notes

Ensure that you replace with the private IP address of your master EC2 instance.

Use the Spark Web UI to monitor your cluster.

http://<Master-Node-IP>:8080

Worker Node Setup

To complete the Spark cluster, the worker nodes need to be set up as follows:

1. Install Java

Update the system and install Java:

sudo yum update -y
sudo yum install -y java-11-amazon-corretto

2. Download and Install Spark

Download Spark:

wget https://archive.apache.org/dist/spark/spark-3.5.2/spark-3.5.2-bin-hadoop3.tgz

Extract the downloaded file:

tar xvf spark-3.5.2-bin-hadoop3.tgz

Move Spark to the /opt directory:

sudo mv spark-3.5.2-bin-hadoop3 /opt/spark

3. Set Environment Variables

Configure the environment variables for Spark and Java:

echo "export SPARK_HOME=/opt/spark" >> ~/.bashrc
echo "export PATH=\$PATH:\$SPARK_HOME/bin" >> ~/.bashrc
echo "export JAVA_HOME=/usr/lib/jvm/java-11-amazon-corretto.x86_64" >> ~/.bashrc
source ~/.bashrc

4. Start the Worker Node

Start the worker node:

$SPARK_HOME/sbin/start-slave.sh spark://<Master-Node-Private-IP>:7077

Replace with the private IP address of the Spark master node.

Verify that the worker node is successfully connected to the master by accessing the Spark Web UI at:

http://<Master-Node-IP>:8080

Verification

The worker node's status should appear as ALIVE on the Spark Web UI. This indicates that the master and worker nodes are successfully communicating.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Spark Setup Guide on AWS EC2

This guide provides step-by-step instructions to set up Apache Spark on AWS EC2 with one master node and one worker node.

Master Node Setup

1. Install Java

2. Download and Install Spark

3. Set Environment Variables

4. Configure Spark (Master Node)

5. Start the Spark Master Node

6. Install AWS CLI

7. Create a Directory for Data

8. Download Data from S3

9. Start Spark Shell

Notes

Worker Node Setup

1. Install Java

2. Download and Install Spark

3. Set Environment Variables

4. Start the Worker Node

Verification

About

Releases

Packages

ShouqSaadRu/spark-setup-aws-ec2-guide

Folders and files

Latest commit

History

Repository files navigation

Spark Setup Guide on AWS EC2

This guide provides step-by-step instructions to set up Apache Spark on AWS EC2 with one master node and one worker node.

Master Node Setup

1. Install Java

2. Download and Install Spark

3. Set Environment Variables

4. Configure Spark (Master Node)

5. Start the Spark Master Node

6. Install AWS CLI

7. Create a Directory for Data

8. Download Data from S3

9. Start Spark Shell

Notes

Worker Node Setup

1. Install Java

2. Download and Install Spark

3. Set Environment Variables

4. Start the Worker Node

Verification

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages