Github Action is used for the CI/CD test. The workflows can be found in .github/workflows
. Currently, there are two workflows:
IntegrationTest.yml
: it builds spark connector binary and runs integration test. This test runs with ONE node local spark cluster. It is the main test suite.ClusterTest.yml
: it builds spark connector binary, setup a spark cluster with docker container and submit test job to the cluster for execution. This test runs with a real spark cluster. It's additional for IntegrationTest.yml.
To run integration and cluster test, Snowflake login credentials are needed. The login credentials are stored in an encrypted json file: snowflake.travis.json.gpg
. The decryption passphrase is stored in github secret settings named SNOWFLAKE_TEST_CONFIG_SECRET
. The workflow will decrypt the json file for testing. For more details refer to Creating and storing encrypted secrets.
An example to retrieve the secret:
- name: Decrypt snowflake.json for testing
run: ./.github/scripts/decrypt_secret.sh snowflake.travis.json snowflake.travis.json.gpg
env:
SNOWFLAKE_TEST_CONFIG_SECRET: ${{ secrets.SNOWFLAKE_TEST_CONFIG_SECRET }}
Test workflow file is .github/workflows/IntegrationTest.yml. Most of the development tests are included in this workflow. Its steps are:
- Decrypt snowflake.travis.json
- Run IT test with script: .github/scripts/run-tests-github.sh
- Update test coverage report
Test workflow file is .github/workflows/ClusterTest.yml. Comparing with IntegrationTest, this workflow is more complex. It needs below steps.
- Build docker image
- Start spark cluster and run test
- Check test result
- Clean up
This step create a spark base image with apache spark, spark connector binary, test binaries and dependent libraries. Related files:
- .github/docker/Dockerfile is the docker file. Many arguments are introduced for this docker file to support multiple versions.
- .github/docker/build_image.sh is the script to create the image.
High level steps to build the docker image:
- Build spark connector binary
- Build scala test jar file and achieve the jar and Python test files as a *.tar.gz file.
- Build docker image with required arguments including spark download URL, JDBC download URL, spark connector binary, decrypt script, encrypted test json file, etc.
Example command in repository root directory:
.github/docker/build_image.sh
docker-compose is used to start a spark cluster and run test. There is one master node and two worker nodes in the cluster. The test driver also runs as a docker-compose service. The test driver script is setup when the docker image is built by RUN_TEST_SCRIPT. As for ClusterTest, the script is ClusterTest/run_cluster_test.sh. For more details refer to .github/docker/docker-compose.yml.
Example command in repository root directory:
docker-compose -f .github/docker/docker-compose.yml up [-d]
In above step, the docker-compose starts the services with "-d". This step waits for the test done and check the result.
Github Action provides global environment variable GITHUB_RUN_ID
which is unique in a repository. The test check is based on it. NOTE: Re-run doesn't change it. GITHUB_SHA
is the commit ID.
A snowflake table: CLUSTER_TEST_RESULT_TABLE is set up to store the result for test case. Column: GITHUBRUNID in this table is for GITHUB_RUN_ID. Column: TESTSTATUS is for test status which can be Fail/Success/Exception. Column: REASON is to indicate details.
For each test case, it must write one row into CLUSTER_TEST_RESULT_TABLE when the test case is done.
For each test run, the total test case count is known. So, we check the finished test count with this GITHUB_RUN_ID. The test run is regarded to be done if the finished test case count is equal to the total test case count. In addition, wait-for timeout is introduced to make sure the test run can be done.
After the test is done, if the Success test case count is equal to the total test case count, the test run is regarded as Success, otherwise, it fails.
Example command in repository root directory:
.github/docker/check_result.sh
It kills and removes the docker containers, remove the docker image and so on.
Example command in repository root directory:
.github/docker/cleanup_docker.sh
The cluster test cases are developed in directory: ClusterTest
. Both Python and Scala are developed. Scala tests are mainly used.
There is only one test case: ClusterTest/src/main/python/ClusterTest.py
Two Example commands in repository root directory:
# run the job in local
$SPARK_HOME/bin/spark-submit \
--conf "spark.pyspark.python=python3" \
--conf "spark.pyspark.driver.python=python3" \
--jars $SPARK_WORKDIR/spark-snowflake_2.11-2.7.0-spark_2.4.jar,$SPARK_WORKDIR/snowflake-jdbc-3.12.2.jar \
ClusterTest/src/main/python/ClusterTest.py local
# run the job in remote cluster
$SPARK_HOME/bin/spark-submit \
--conf "spark.pyspark.python=python3" \
--conf "spark.pyspark.driver.python=python3" \
--jars $SPARK_WORKDIR/spark-snowflake_2.11-2.7.0-spark_2.4.jar,$SPARK_WORKDIR/snowflake-jdbc-3.12.2.jar \
--master spark://master:7077 --deploy-mode client \
$SPARK_WORKDIR/ClusterTest.py remote
Scala Test is well-designed to add more test cases in the future. There is a sbt project in ClusterTest for the scala test. IntelliJ can import this project separately.
Example command to compile the test project
cd ClusterTest
sbt ++2.11.12 package
cd ..
The test driver class is net.snowflake.spark.snowflake.ClusterTest . The caller can run multiple classes (multiple test cases are separated by ';').
Example command to run the scala test
# run the job in local
$SPARK_HOME/bin/spark-submit \
--jars $SPARK_WORKDIR/spark-snowflake_2.11-2.7.0-spark_2.4.jar,$SPARK_WORKDIR/snowflake-jdbc-3.12.2.jar \
--class net.snowflake.spark.snowflake.ClusterTest \
ClusterTest/target/scala-2.11/clustertest_2.11-1.0.jar \
local "net.snowflake.spark.snowflake.testsuite.BasicReadWriteSuite;"
# run the job in remote cluster
$SPARK_HOME/bin/spark-submit \
--jars $SPARK_WORKDIR/spark-snowflake_2.11-2.7.0-spark_2.4.jar,$SPARK_WORKDIR/snowflake-jdbc-3.12.2.jar \
--master spark://master:7077 --deploy-mode client \
--class net.snowflake.spark.snowflake.ClusterTest \
ClusterTest/target/scala-2.11/clustertest_2.11-1.0.jar \
remote "net.snowflake.spark.snowflake.testsuite.BasicReadWriteSuite;net.snowflake.spark.snowflake.testsuite.BasicReadWriteSuite;"
net.snowflake.spark.snowflake.testsuite.BasicReadWriteSuite can be a good template to add a new test case. Below are suggested operations.
- Create a new class in the same directory for example ClusterTest/src/main/scala/net/snowflake/spark/snowflake/testsuite/MyNewTestSuite.scala
- This class must inherit from
trait ClusterTestSuiteBase
. It needs to implement:runImpl(SparkSession,ClusterTestResultBuilder)
- Update ClusterTest/run_cluster_test.sh to run the new test. It can be submitted together with other test case or separately.
- Update TOTAL_TEST_CASE_COUNT in .github/docker/check_result.sh to make sure it is correct.
IntelliJ can be used to develop the test case. You can set below to debug the test case in IntelliJ.
- Program arguments: "local net.snowflake.spark.snowflake.testsuite.BasicReadWriteSuite"
- Environment Variables: "GITHUB_RUN_ID=debug_run_id;GITHUB_SHA=debug_commit_id"
Similarly, you can also debug net.snowflake.spark.snowflake.ClusterTestCheckResult in IntelliJ.
- Program arguments: "2 60"
- Environment Variables: "GITHUB_RUN_ID=debug_run_id;GITHUB_SHA=debug_commit_id"
Files in directory .github/workflows
- ClusterTest.yml: Workflow definition file for Cluster Test
- IntegrationTest.yml: Workflow definition file for Integration Test
Files in directory .github/docker
- Dockerfile: the docker file for spark base image
- build_image.sh: the script to create spark base image
- check_result.sh: the script to wait for test done and check test result
- cleanup_docker.sh: the script to clean up container and image.
- docker-compose.yml: the docker-compose file
- entrypoint.sh: the boot script when a docker image run
- logging.properties: the logging properties for JDBC, it is not supported yet.
- spark-env.sh: The env setup script for spark master and worker. It is not used yet.
Files in directory .github/scripts
- decrypt_secret.sh: the script to decrypt snowflake.travis.json.gpg
- run-tests-github.sh: the script to run spark connector IT test.
Files in ClusterTest
- build.sbt: sbt build file for scala test
- run_cluster_test.sh: the script to run test
- src/main/python/*.py: the python test cases.
- src/main/scala//net/snowflake/spark/snowflake/ClusterTestCheckResult.scala: The class to check test result
- src/main/scala//net/snowflake/spark/snowflake/ClusterTest.scala: The driver class to run test
- src/main/scala//net/snowflake/spark/snowflake/testsuite/ClusterTestSuiteBase.scala: Test case base class.
- src/main/scala//net/snowflake/spark/snowflake/testsuite/LowMemoryStress.scala: Test upload large partition with available low memory.
- Files in src/main/scala: the Scala test related files.
Test uploading large partition with a small amount of memory is defined in Cluster Test. The partition used in the test is of size 227 MB after compression and 330 MB before compression. By limiting the size of executor memory, multiple OOM errors can be reproduced.
The connector hit OOM while trying to write data into a buffer. The processed size column indicates the processed data size before OOM.
Before multi-part upload (SNOW-187770) was finished, OOM happens like the following:
--executor-memory | processed size | OOM |
---|---|---|
700m | 113 MB | write OOM |
750m | 227 MB (full size) | None |
After multi-part upload was implemented:
--executor-memory | processed size | OOM |
---|---|---|
600m | 227 MB (full size) | None |
To better understand why this 227 MB file requires around at least 600MB memory to be processed, please refer to this link: https://www.tutorialdocs.com/article/spark-memory-management.html
A test hook was added to partition uploading function. When the test hook flag is enabled, the hook will fail twice and pass the thrid time.
Two intergration tests were added and Spark retry counts are set to 2 and 3 respectively. The test that only retries twice will throw exception, while the test that retries three times will pass without exception.