Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modernize testing cluster docker images #30

Merged
merged 4 commits into from
Dec 17, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion cluster/.gitignore
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
/code
/data
/jars
17 changes: 8 additions & 9 deletions cluster/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,28 +1,27 @@
FROM java:8-jre-alpine
FROM eclipse-temurin:11-jdk

RUN apk update
RUN apk add ca-certificates wget bash procps coreutils
RUN apt update
RUN apt install -yy ca-certificates wget bash procps coreutils python3
RUN update-ca-certificates

RUN mkdir -p /opt
WORKDIR /opt

ARG HADOOP_VERSION
RUN wget http://apache.mirrors.lucidnetworks.net/hadoop/common/hadoop-${HADOOP_VERSION}/hadoop-${HADOOP_VERSION}.tar.gz && \
RUN wget https://dlcdn.apache.org/hadoop/common/hadoop-${HADOOP_VERSION}/hadoop-${HADOOP_VERSION}.tar.gz && \
tar -xzvf hadoop-${HADOOP_VERSION}.tar.gz && \
rm hadoop-${HADOOP_VERSION}.tar.gz && \
mv hadoop-${HADOOP_VERSION} hadoop

ARG SPARK_VERSION
ARG SPARK_VARIANT
ARG SPARK_VARIANT=without-hadoop
RUN wget https://archive.apache.org/dist/spark/spark-${SPARK_VERSION}/spark-${SPARK_VERSION}-bin-${SPARK_VARIANT}.tgz && \
tar -xzvf spark-${SPARK_VERSION}-bin-${SPARK_VARIANT}.tgz && \
rm spark-${SPARK_VERSION}-bin-${SPARK_VARIANT}.tgz && \
mv spark-${SPARK_VERSION}-bin-${SPARK_VARIANT} spark

ENV SPARK_HOME /opt/spark
ENV HADOOP_HOME=/opt/hadoop
ENV SPARK_HOME=/opt/spark
ADD spark-env.sh /opt/spark/conf/spark-env.sh

RUN mkdir -p /tmp/spark-events

ADD entry.sh /opt
ENTRYPOINT ["/opt/entry.sh"]
54 changes: 22 additions & 32 deletions cluster/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,58 +10,48 @@ issues that do not occur in local development contexts.

Initialize the cluster, containing a master and one worker:

```
docker-compose -f docker-compose.yml up -d master worker-1
```shell
docker compose up -d
```

You can submit an application with the submit script:

```shell
cp $PROJECT/target/uberjar/my-app.jar cluster/code/
./submit.sh my-app.jar
```
# Launch the containers
$ docker-compose up -d

# Copy uberjar to `jars` dir, your exact steps may vary
$ lein uberjar
$ cp $PROJECT/target/uberjar/my-app.jar docker/jars/

$ ./submit.sh my-app.jar
```
You can also submit an application using the Spark master's REST API. First,
create a JSON file with the request body:

You can also submit an application using the Spark master's REST API:

```
# Place a JSON request body in a file
$ cat request.json
```json
{
"action": "CreateSubmissionRequest",
"appArgs": ["file:///data/hamlet.txt"],
"appResource": "file:///mnt/jars/spark-word-count.jar",
"clientSparkVersion": "2.4.4",
"appResource": "file:///mnt/code/my-app.jar",
"clientSparkVersion": "3.5.1",
"environmentVariables": {"SPARK_ENV_LOADED": "1"},
"mainClass": "spark_word_count.main",
"mainClass": "my_app.main",
"sparkProperties":
{
"spark.jars": "file:///mnt/jars/spark-word-count.jar",
"spark.executor.cores": 1,
"spark.executor.count": 1,
"spark.executor.memory": "1G",
"spark.app.name": "my-app",
"spark.submit.deployMode": "cluster",
"spark.jars": "file:///mnt/code/my-app.jar",
"spark.driver.cores": 1,
"spark.driver.memory": "1G",
"spark.driver.supervise": "false",
"spark.app.name": "sparkplug",
"spark.submit.deployMode": "cluster",
"spark.executor.cores": 1,
"spark.executor.count": 1,
"spark.executor.memory": "1G",
"spark.logConf": "true"
}
}
```

$ curl -X POST --data @request.json http://localhost:6066/v1/submissions/create
{
"action" : "CreateSubmissionResponse",
"message" : "Driver successfully submitted as driver-20200324235704-0000",
"serverSparkVersion" : "2.4.4",
"submissionId" : "driver-20200324235704-0000",
"success" : true
}
Then submit it to the scheduling HTTP endpoint:

```shell
curl http://localhost:6066/v1/submissions/create --data @request.json
```

## Endpoints
Expand Down
Empty file added cluster/code/.keep
Empty file.
23 changes: 10 additions & 13 deletions cluster/docker-compose.yml
Original file line number Diff line number Diff line change
@@ -1,14 +1,12 @@
version: "3"
services:
master:
build:
context: .
dockerfile: Dockerfile
args:
HADOOP_VERSION: 2.9.2
SPARK_VERSION: 2.4.4
SPARK_VARIANT: without-hadoop-scala-2.12
command: sbin/start-master.sh
HADOOP_VERSION: 3.3.5
SPARK_VERSION: 3.5.1
command: /opt/spark/sbin/start-master.sh
restart: on-failure
hostname: master
environment:
Expand All @@ -30,17 +28,16 @@ services:
- 7077:7077
- 8080:8080
volumes:
- ./jars:/mnt/jars
- ./code:/mnt/code

worker-1:
build:
context: .
dockerfile: Dockerfile
args:
HADOOP_VERSION: 2.9.2
SPARK_VERSION: 2.4.4
SPARK_VARIANT: without-hadoop-scala-2.12
command: sbin/start-slave.sh spark://master:7077
HADOOP_VERSION: 3.3.5
SPARK_VERSION: 3.5.1
command: /opt/spark/sbin/start-worker.sh spark://master:7077
restart: on-failure
hostname: worker-1
environment:
Expand All @@ -66,11 +63,11 @@ services:
- 8081:8081
- 8881:8881
volumes:
- ./jars:/mnt/jars
- ./code:/mnt/code
- ./data:/data

repl:
image: java:8-jre-alpine
image: eclipse-temurin:11-jdk
command: java -jar /sparkplug-repl.jar
restart: on-failure
hostname: repl
Expand All @@ -81,7 +78,7 @@ services:
- 4050:4040
- 8765:8765
volumes:
- ./jars/sparkplug-repl.jar:/sparkplug-repl.jar
- ./code/sparkplug-repl.jar:/sparkplug-repl.jar
- ./data:/data

networks:
Expand Down
13 changes: 0 additions & 13 deletions cluster/entry.sh

This file was deleted.

4 changes: 4 additions & 0 deletions cluster/spark-env.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# Spark environment customizations

export SPARK_DIST_CLASSPATH=$(/opt/hadoop/bin/hadoop classpath)
export SPARK_NO_DAEMONIZE=1
16 changes: 8 additions & 8 deletions cluster/submit.sh
Original file line number Diff line number Diff line change
@@ -1,18 +1,18 @@
#!/bin/bash

APP_JAR="$1"
APP_DRIVER="$1"

if [[ -z $APP_JAR ]]; then
echo "No application jar file provided!" >&2
if [[ -z $APP_DRIVER ]]; then
echo "No application driver code provided!" >&2
exit 1
fi

if [[ ! -f jars/$APP_JAR ]]; then
echo "Couldn't find jars/$APP_JAR - did you copy it in place?" >&2
if [[ ! -f code/$APP_DRIVER ]]; then
echo "Couldn't find code/$APP_DRIVER - did you copy it in place?" >&2
exit 2
fi

docker-compose exec master \
bin/spark-submit \
docker compose exec master \
/opt/spark/bin/spark-submit \
--master spark://master:7077 \
/mnt/jars/$APP_JAR
/mnt/code/$APP_DRIVER
6 changes: 3 additions & 3 deletions sparkplug-repl/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,9 +9,9 @@ connected to a Spark cluster.

First, build the REPL uberjar and copy it into the Docker cluster:

```
$ lein uberjar
$ cp target/uberjar/sparkplug-repl.jar ../cluster/jars
```shell
lein uberjar
cp target/uberjar/sparkplug-repl.jar ../cluster/code
```

Next, start up the REPL container in another terminal:
Expand Down
2 changes: 1 addition & 1 deletion sparkplug-repl/project.clj
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@

:profiles
{:default
[:base :system :user :provided :spark-3.1 :dev]
[:base :system :user :provided :spark-3.5 :dev]

:repl
{:repl-options
Expand Down