We have a stand-alone example that you can run in the integration tests.
The example is based off of the mortgage dataset you can download
here
and the code is in the com.nvidia.spark.rapids.tests.mortgage
package.
Unit tests implemented using the ScalaTest framework reside in the tests directory. This is unconventional and is done so we can run the tests on the close-to-final shaded single-shim version of the plugin. It also helps with how we collect code coverage.
The tests
module depends on the aggregator
module which shades external dependencies and
aggregates them along with internal submodules into an artifact supporting a single Spark version.
The minimum required Maven phase to run unit tests is package
. Alternatively, you may run
mvn install
and use mvn test
for subsequent testing. However, to avoid dealing with stale jars
in the local Maven repo cache, we recommend to invoke mvn package -pl tests -am ...
from the
spark-rapids
root directory. Add -f scala2.13
if you want to run unit tests against
Apache Spark dependencies based on Scala 2.13.
To run targeted Scala tests use
-DwildcardSuites=<comma separated list of packages or fully-qualified test suites>
Or easier, use a combination of
-Dsuffixes=<comma separated list of suffix regexes>
to restrict the test suites being run,
which corresponds to -q
option in the
ScalaTest runner.
and
-Dtests=<comma separated list of keywords or test names>
, to restrict tests run within test suites,
which corresponds to -z
or -t
options in the
ScalaTest runner.
For more information about using scalatest with Maven please refer to the scalatest documentation and the the source code.
You can run the unit tests against different versions of Spark using the different profiles. The default version runs against Spark 3.1.1, to run against a specific version use a buildver property:
-Dbuildver=311
(Spark 3.1.1)-Dbuildver=350
(Spark 3.5.0)
etc
Please refer to the tests project POM to see the list of test profiles supported.
Apache Spark specific configurations can be passed in by setting the SPARK_CONF
environment
variable.
Examples:
To run all tests against Apache Spark 3.2.1,
mvn package -pl tests -am -Dbuildver=321
To pass Apache Spark configs --conf spark.dynamicAllocation.enabled=false --conf spark.task.cpus=1
do something like.
SPARK_CONF="spark.dynamicAllocation.enabled=false,spark.task.cpus=1" mvn ...
To run all tests in ParquetWriterSuite
in package com.nvidia.spark.rapids, issue
mvn package -pl tests -am -DwildcardSuites="com.nvidia.spark.rapids.ParquetWriterSuite"
To run all AnsiCastOpSuite and CastOpSuite tests dealing with decimals using Apache Spark 3.3.0 on Scala 2.13 artifacts, issue:
mvn package -f scala2.13 -pl tests -am -Dbuildver=330 -Dsuffixes='.*CastOpSuite' -Dtests=decimal
Please refer to the integration-tests README