This library enables you to write the business logic of your Spark application without depending on
RDD
s and the SparkContext
.
Why you would want to do that? Because firing up a SparkContext
and running your unit tests in a local Spark cluster is really slow. kontextfrei frees you from
this hard dependency on a SparkContext
, ultimately leading to a much faster feedback cycle during
development.
Please visit the kontextfrei website to learn more about how to use this library.
For an example that showcases how the library can be used, please have a look at kontextfrei-example.
The library is split up into two modules:
kontextfrei-core
: You definitely need this to use the librarykontextfrei-scalatest
: Some optional goodies to make testing your application logic easier and remove some boilerplate; this comes with a transitive dependency on ScalaTest and ScalaCheck.
kontextfrei assumes that the Spark dependency is provided by your application, so have to explicitly add a dependency to Spark.
Currently, kontextfrei binary releases are built against Spark 1.4.1, 2.0.2, 2.1.3, 2.2.3 (each of them both for Scala 2.11 and 2.10), 2.3.2 (compiled for Scala 2.11), and 2.4.0 (compiled both for Scala 2.11 and 2.12).
Adding a dependency on the the current version of kontextfrei-core
and kontextfrei-scalatest
to your build.sbt
looks like this:
resolvers += "dwestheide" at "https://dl.bintray.com/dwestheide/maven"
libraryDependencies += "com.danielwestheide" %% "kontextfrei-core-spark-1.4.1" % "0.8.0"
libraryDependencies += "com.danielwestheide" %% "kontextfrei-scalatest-spark-1.4.1" % "0.8.0" % "test,it"
libraryDependencies += "org.apache.spark" %% "spark-core" % "1.4.1" % "provided"
resolvers += "dwestheide" at "https://dl.bintray.com/dwestheide/maven"
libraryDependencies += "com.danielwestheide" %% "kontextfrei-core-spark-2.0.2" % "0.8.0"
libraryDependencies += "com.danielwestheide" %% "kontextfrei-scalatest-spark-2.0.2" % "0.8.0" % "test,it"
libraryDependencies += "org.apache.spark" %% "spark-core" % "2.0.2" % "provided"
resolvers += "dwestheide" at "https://dl.bintray.com/dwestheide/maven"
libraryDependencies += "com.danielwestheide" %% "kontextfrei-core-spark-2.1.3" % "0.8.0"
libraryDependencies += "com.danielwestheide" %% "kontextfrei-scalatest-spark-2.1.3" % "0.8.0" % "test,it"
libraryDependencies += "org.apache.spark" %% "spark-core" % "2.1.3" % "provided"
resolvers += "dwestheide" at "https://dl.bintray.com/dwestheide/maven"
libraryDependencies += "com.danielwestheide" %% "kontextfrei-core-spark-2.2.3" % "0.8.0"
libraryDependencies += "com.danielwestheide" %% "kontextfrei-scalatest-spark-2.2.3" % "0.8.0" % "test,it"
libraryDependencies += "org.apache.spark" %% "spark-core" % "2.2.3" % "provided"
resolvers += "dwestheide" at "https://dl.bintray.com/dwestheide/maven"
libraryDependencies += "com.danielwestheide" %% "kontextfrei-core-spark-2.3.2" % "0.8.0"
libraryDependencies += "com.danielwestheide" %% "kontextfrei-scalatest-spark-2.3.2" % "0.8.0" % "test,it"
libraryDependencies += "org.apache.spark" %% "spark-core" % "2.3.2" % "provided"
resolvers += "dwestheide" at "https://dl.bintray.com/dwestheide/maven"
libraryDependencies += "com.danielwestheide" %% "kontextfrei-core-spark-2.4.0" % "0.8.0"
libraryDependencies += "com.danielwestheide" %% "kontextfrei-scalatest-spark-2.4.0" % "0.8.0" % "test,it"
libraryDependencies += "org.apache.spark" %% "spark-core" % "2.4.0" % "provided"
As you can see, you need the specify the Spark version against which the library is supposed to be built as part of the artifact name.
This library is in an early stage and is not feature-complete: only a subset of the operations available on RDD
s is supported so far.
Shout out to Simon J. Scott for his valuable contributions to this project.
Any kind of contribution is welcome. Please see the Contribution Guidelines.