Skip to content

Commit c651127

Browse files
mccheahfoxish
authored andcommitted
Move docker image management and test entrypoint to Maven (#31)
* Use an nginx server for remote jars tests. * Moves all integration test setup logic to Maven and scripts. The Kubernetes integration tests now always expect an image to be pre-built, so we no longer build images with Scala code. Maven's pre-integration-test invokes a single script to bootstrap the environment with the built images, etc. In the transition we try to keep as much of the same semantics as possible. * Update documentation * Remove unnecessary .gitignore entries * Use $IMAGE_TAG instead of $TAG * Don't write image tag file twice * Remove using nginx file server * Remove some lines * Split building Spark for dev environment from build reactor * Small docs fix * Docs formatting fix * Spark TGZ can be empty instead of N/A, throw an error if not provided. * Remove extraneous --skip-building-docker-images flag. * Switch back to using the N/A placeholder * Remove extraneous code * Remove maven args because they don't work * Fix scripts * Don't get Maven if it's already there * Put quotes everywhere * Minor formatting * Hard set Minikube binary location. * Run Minikube from bash -c
1 parent f80d1d5 commit c651127

File tree

22 files changed

+358
-549
lines changed

22 files changed

+358
-549
lines changed

.gitignore

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,11 @@
11
.idea/
2-
spark/
3-
integration-test/target/
2+
target/
3+
build/*.jar
4+
build/apache-maven*
5+
build/scala*
6+
build/zinc*
7+
build/run-mvn
48
*.class
59
*.log
610
*.iml
11+
*.swp

README.md

Lines changed: 64 additions & 95 deletions
Original file line numberDiff line numberDiff line change
@@ -8,98 +8,67 @@ title: Spark on Kubernetes Integration Tests
88
Note that the integration test framework is currently being heavily revised and
99
is subject to change. Note that currently the integration tests only run with Java 8.
1010

11-
As shorthand to run the tests against any given cluster, you can use the `e2e/runner.sh` script.
12-
The script assumes that you have a functioning Kubernetes cluster (1.6+) with kubectl
13-
configured to access it. The master URL of the currently configured cluster on your
14-
machine can be discovered as follows:
15-
16-
```
17-
$ kubectl cluster-info
18-
19-
Kubernetes master is running at https://xyz
20-
```
21-
22-
If you want to use a local [minikube](https://github.com/kubernetes/minikube) cluster,
23-
the minimum tested version is 0.23.0, with the kube-dns addon enabled
24-
and the recommended configuration is 3 CPUs and 4G of memory. There is also a wrapper
25-
script for running on minikube, `e2e/e2e-minikube.sh` for testing the master branch
26-
of the apache/spark repository in specific.
27-
28-
```
29-
$ minikube start --memory 4000 --cpus 3
30-
```
31-
32-
If you're using a non-local cluster, you must provide an image repository
33-
which you have write access to, using the `-i` option, in order to store docker images
34-
generated during the test.
35-
36-
Example usages of the script:
37-
38-
```
39-
$ ./e2e/runner.sh -m https://xyz -i docker.io/foxish -d cloud
40-
$ ./e2e/runner.sh -m https://xyz -i test -d minikube
41-
$ ./e2e/runner.sh -m https://xyz -i test -r https://github.com/my-spark/spark -d minikube
42-
$ ./e2e/runner.sh -m https://xyz -i test -r https://github.com/my-spark/spark -b my-branch -d minikube
43-
```
44-
45-
# Detailed Documentation
46-
47-
## Running the tests using maven
48-
49-
Integration tests firstly require installing [Minikube](https://kubernetes.io/docs/getting-started-guides/minikube/) on
50-
your machine, and for the `Minikube` binary to be on your `PATH`.. Refer to the Minikube documentation for instructions
51-
on how to install it. It is recommended to allocate at least 8 CPUs and 8GB of memory to the Minikube cluster.
52-
53-
Running the integration tests requires a Spark distribution package tarball that
54-
contains Spark jars, submission clients, etc. You can download a tarball from
55-
http://spark.apache.org/downloads.html. Or, you can create a distribution from
56-
source code using `make-distribution.sh`. For example:
57-
58-
```
59-
$ git clone [email protected]:apache/spark.git
60-
$ cd spark
61-
$ ./dev/make-distribution.sh --tgz \
62-
-Phadoop-2.7 -Pkubernetes -Pkinesis-asl -Phive -Phive-thriftserver
63-
```
64-
65-
The above command will create a tarball like spark-2.3.0-SNAPSHOT-bin.tgz in the
66-
top-level dir. For more details, see the related section in
67-
[building-spark.md](https://github.com/apache/spark/blob/master/docs/building-spark.md#building-a-runnable-distribution)
68-
69-
70-
Once you prepare the tarball, the integration tests can be executed with Maven or
71-
your IDE. Note that when running tests from an IDE, the `pre-integration-test`
72-
phase must be run every time the Spark main code changes. When running tests
73-
from the command line, the `pre-integration-test` phase should automatically be
74-
invoked if the `integration-test` phase is run.
75-
76-
With Maven, the integration test can be run using the following command:
77-
78-
```
79-
$ mvn clean integration-test \
80-
-Dspark-distro-tgz=spark/spark-2.3.0-SNAPSHOT-bin.tgz
81-
```
82-
83-
## Running against an arbitrary cluster
84-
85-
In order to run against any cluster, use the following:
86-
```sh
87-
$ mvn clean integration-test \
88-
-Dspark-distro-tgz=spark/spark-2.3.0-SNAPSHOT-bin.tgz \
89-
-DextraScalaTestArgs="-Dspark.kubernetes.test.master=k8s://https://<master>
90-
91-
## Reuse the previous Docker images
92-
93-
The integration tests build a number of Docker images, which takes some time.
94-
By default, the images are built every time the tests run. You may want to skip
95-
re-building those images during development, if the distribution package did not
96-
change since the last run. You can pass the property
97-
`spark.kubernetes.test.imageDockerTag` to the test process and specify the Docker
98-
image tag that is appropriate.
99-
Here is an example:
100-
101-
```
102-
$ mvn clean integration-test \
103-
-Dspark-distro-tgz=spark/spark-2.3.0-SNAPSHOT-bin.tgz \
104-
-Dspark.kubernetes.test.imageDockerTag=latest
105-
```
11+
The simplest way to run the integration tests is to install and run Minikube, then run the following:
12+
13+
dev/dev-run-integration-tests.sh
14+
15+
The minimum tested version of Minikube is 0.23.0. The kube-dns addon must be enabled. Minikube should
16+
run with a minimum of 3 CPUs and 4G of memory:
17+
18+
minikube start --cpus 3 --memory 4096
19+
20+
You can download Minikube [here](https://github.com/kubernetes/minikube/releases).
21+
22+
# Integration test customization
23+
24+
Configuration of the integration test runtime is done through passing different arguments to the test script. The main useful options are outlined below.
25+
26+
## Use a non-local cluster
27+
28+
To use your own cluster running in the cloud, set the following:
29+
30+
* `--deploy-mode cloud` to indicate that the test is connecting to a remote cluster instead of Minikube,
31+
* `--spark-master <master-url>` - set `<master-url>` to the externally accessible Kubernetes cluster URL,
32+
* `--image-repo <repo>` - set `<repo>` to a write-accessible Docker image repository that provides the images for your cluster. The framework assumes your local Docker client can push to this repository.
33+
34+
Therefore the command looks like this:
35+
36+
dev/dev-run-integration-tests.sh \
37+
--deploy-mode cloud \
38+
--spark-master https://example.com:8443/apiserver \
39+
--image-repo docker.example.com/spark-images
40+
41+
## Re-using Docker Images
42+
43+
By default, the test framework will build new Docker images on every test execution. A unique image tag is generated,
44+
and it is written to file at `target/imageTag.txt`. To reuse the images built in a previous run, or to use a Docker image tag
45+
that you have built by other means already, pass the tag to the test script:
46+
47+
dev/dev-run-integration-tests.sh --image-tag <tag>
48+
49+
where if you still want to use images that were built before by the test framework:
50+
51+
dev/dev-run-integration-tests.sh --image-tag $(cat target/imageTag.txt)
52+
53+
## Customizing the Spark Source Code to Test
54+
55+
By default, the test framework will test the master branch of Spark from [here](https://github.com/apache/spark). You
56+
can specify the following options to test against different source versions of Spark:
57+
58+
* `--spark-repo <repo>` - set `<repo>` to the git or http URI of the Spark git repository to clone
59+
* `--spark-branch <branch>` - set `<branch>` to the branch of the repository to build.
60+
61+
62+
An example:
63+
64+
dev/dev-run-integration-tests.sh \
65+
--spark-repo https://github.com/apache-spark-on-k8s/spark \
66+
--spark-branch new-feature
67+
68+
Additionally, you can use a pre-built Spark distribution. In this case, the repository is not cloned at all, and no
69+
source code has to be compiled.
70+
71+
* `--spark-tgz <path-to-tgz>` - set `<path-to-tgz>` to point to a tarball containing the Spark distribution to test.
72+
73+
When the tests are cloning a repository and building it, the Spark distribution is placed in `target/spark/spark-<VERSION>.tgz`.
74+
Reuse this tarball to save a significant amount of time if you are iterating on the development of these integration tests.
Lines changed: 10 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
1-
#!/bin/bash
1+
#!/usr/bin/env bash
22

3+
#
34
# Licensed to the Apache Software Foundation (ASF) under one or more
45
# contributor license agreements. See the NOTICE file distributed with
56
# this work for additional information regarding copyright ownership.
@@ -14,23 +15,15 @@
1415
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
1516
# See the License for the specific language governing permissions and
1617
# limitations under the License.
18+
#
1719

18-
### This script can be used to run integration tests locally on minikube.
19-
### Requirements: minikube v0.23+ with the DNS addon enabled, and kubectl configured to point to it.
20+
BUILD_DIR=$(dirname $0)
2021

21-
set -ex
22+
MVN_RUNNER=$BUILD_DIR/run-mvn
2223

23-
### Basic Validation ###
24-
if [ ! -d "integration-test" ]; then
25-
echo "This script must be invoked from the top-level directory of the integration-tests repository"
26-
usage
27-
exit 1
24+
if [ ! -f $MVN_RUNNER ];
25+
then
26+
curl -s --progress-bar https://raw.githubusercontent.com/apache/spark/master/build/mvn > $MVN_RUNNER
27+
chmod +x $MVN_RUNNER
2828
fi
29-
30-
# Set up config.
31-
master=$(kubectl cluster-info | head -n 1 | grep -oE "https?://[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}(:[0-9]+)?")
32-
repo="https://github.com/apache/spark"
33-
image_repo=test
34-
35-
# Run tests in minikube mode.
36-
./e2e/runner.sh -m $master -r $repo -i $image_repo -d minikube
29+
source $MVN_RUNNER

dev/dev-run-integration-tests.sh

Lines changed: 100 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,100 @@
1+
#!/usr/bin/env bash
2+
3+
#
4+
# Licensed to the Apache Software Foundation (ASF) under one or more
5+
# contributor license agreements. See the NOTICE file distributed with
6+
# this work for additional information regarding copyright ownership.
7+
# The ASF licenses this file to You under the Apache License, Version 2.0
8+
# (the "License"); you may not use this file except in compliance with
9+
# the License. You may obtain a copy of the License at
10+
#
11+
# http://www.apache.org/licenses/LICENSE-2.0
12+
#
13+
# Unless required by applicable law or agreed to in writing, software
14+
# distributed under the License is distributed on an "AS IS" BASIS,
15+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
16+
# See the License for the specific language governing permissions and
17+
# limitations under the License.
18+
#
19+
20+
TEST_ROOT_DIR=$(git rev-parse --show-toplevel)
21+
BRANCH="master"
22+
SPARK_REPO="https://github.com/apache/spark"
23+
SPARK_REPO_LOCAL_DIR="$TEST_ROOT_DIR/target/spark"
24+
DEPLOY_MODE="minikube"
25+
IMAGE_REPO="docker.io/kubespark"
26+
SPARK_TGZ="N/A"
27+
IMAGE_TAG="N/A"
28+
SPARK_MASTER=
29+
30+
# Parse arguments
31+
while (( "$#" )); do
32+
case $1 in
33+
--spark-branch)
34+
BRANCH="$2"
35+
shift
36+
;;
37+
--spark-repo)
38+
SPARK_REPO="$2"
39+
shift
40+
;;
41+
--image-repo)
42+
IMAGE_REPO="$2"
43+
shift
44+
;;
45+
--image-tag)
46+
IMAGE_TAG="$2"
47+
shift
48+
;;
49+
--deploy-mode)
50+
DEPLOY_MODE="$2"
51+
shift
52+
;;
53+
--spark-tgz)
54+
SPARK_TGZ="$2"
55+
shift
56+
;;
57+
*)
58+
break
59+
;;
60+
esac
61+
shift
62+
done
63+
64+
if [[ $SPARK_TGZ == "N/A" ]];
65+
then
66+
echo "Cloning $SPARK_REPO into $SPARK_REPO_LOCAL_DIR and checking out $BRANCH."
67+
68+
# clone spark distribution if needed.
69+
if [ -d "$SPARK_REPO_LOCAL_DIR" ];
70+
then
71+
(cd $SPARK_REPO_LOCAL_DIR && git fetch origin $branch);
72+
else
73+
mkdir -p $SPARK_REPO_LOCAL_DIR;
74+
git clone -b $BRANCH --single-branch $SPARK_REPO $SPARK_REPO_LOCAL_DIR;
75+
fi
76+
cd $SPARK_REPO_LOCAL_DIR
77+
git checkout -B $BRANCH origin/$branch
78+
./dev/make-distribution.sh --tgz -Phadoop-2.7 -Pkubernetes -DskipTests;
79+
SPARK_TGZ=$(find $SPARK_REPO_LOCAL_DIR -name spark-*.tgz)
80+
echo "Built Spark TGZ at $SPARK_TGZ".
81+
cd -
82+
fi
83+
84+
cd $TEST_ROOT_DIR
85+
86+
if [ -z $SPARK_MASTER ];
87+
then
88+
build/mvn integration-test \
89+
-Dspark.kubernetes.test.sparkTgz=$SPARK_TGZ \
90+
-Dspark.kubernetes.test.imageTag=$IMAGE_TAG \
91+
-Dspark.kubernetes.test.imageRepo=$IMAGE_REPO \
92+
-Dspark.kubernetes.test.deployMode=$DEPLOY_MODE;
93+
else
94+
build/mvn integration-test \
95+
-Dspark.kubernetes.test.sparkTgz=$SPARK_TGZ \
96+
-Dspark.kubernetes.test.imageTag=$IMAGE_TAG \
97+
-Dspark.kubernetes.test.imageRepo=$IMAGE_REPO \
98+
-Dspark.kubernetes.test.deployMode=$DEPLOY_MODE \
99+
-Dspark.kubernetes.test.master=$SPARK_MASTER;
100+
fi

e2e/e2e-prow.sh

Lines changed: 0 additions & 39 deletions
This file was deleted.

0 commit comments

Comments
 (0)