This is a full-fledged example of a working custom workload for SparkTC/spark-bench.
Table of Contents generated with DocToc
In this example, I created a custom data generation workload that will take in a string and make a dataset consisting of however many rows and columns you want of just that string.
From this configuration:
{
name = "custom"
class = "com.example.WordGenerator"
output = "console"
rows = 10
cols = 3
word = "Cool stuff!!"
}
I get this result:
+------------+------------+------------+
| 0| 1| 2|
+------------+------------+------------+
|Cool stuff!!|Cool stuff!!|Cool stuff!!|
|Cool stuff!!|Cool stuff!!|Cool stuff!!|
|Cool stuff!!|Cool stuff!!|Cool stuff!!|
|Cool stuff!!|Cool stuff!!|Cool stuff!!|
|Cool stuff!!|Cool stuff!!|Cool stuff!!|
|Cool stuff!!|Cool stuff!!|Cool stuff!!|
|Cool stuff!!|Cool stuff!!|Cool stuff!!|
|Cool stuff!!|Cool stuff!!|Cool stuff!!|
|Cool stuff!!|Cool stuff!!|Cool stuff!!|
|Cool stuff!!|Cool stuff!!|Cool stuff!!|
+------------+------------+------------+
This repo includes the source code, build.sbt file, and example configuration file for creating and using a custom workload.
- Install spark-bench and sbt.
- Clone this repo and
cd
into the directory. - Change the value of
sparkBenchPath
in build.sbt to reflect your environment. - Run
sbt package
. This will create a jar intarget/scala-2.11/
. DO NOT use sbt assembly for custom workloads! - Move this new jar into the
lib
folder of your spark-bench installation. - Change the value of
driver-class-path
in custom-workload-example.conf to reflect your environment - ./path/to/your/spark-bench/installation/bin/spark-bench.sh ./bin/custom-workload-example.conf