Sparkle

Sparkle is a framework largely inspired by SpringBoot that help Data Engineer write Spark Job. He only use librairies provided by Spark.

Exemple

    public class SparkleSample extends Job {
  
        public static void main(String args[]) {
            SparkleApplication.run(SparkleSample.class, args);
        }
        
        public run() {
            read(getSource())
                .as(csvh())
                .createOrReplaceTempView("train");
            
            description("query", "show \uD83E\uDD84");
            Dataset<Row> ds = spark().sql("SELECT sum(age) as sum_age, avg(age) as avg_age FROM train where is_not_blank(Cabin) group by sex");
            
            overwrite(ds).as(parquet(getDestination()));
        }
    }

Run on java command line

 java sparkle-sample.jar SparkleSample -s /tmp/titanic/train.csv -d /tmp/destination.parquet --debug

Run on Cluster

 spark-submit --class SparkleSample sparkle-sample.jar -s /tmp/titanic/train.csv -d /tmp/destination.parquet --debug

Features :

Simple CLI Configuration
Automatically register UDFs
Add listener to monitor your job
Add messages in Spark UI
Write Once Run Everywhere (Standalone, Cluster ...)
For Windows no more winutils error
Testing : Improve errors messages
Package FatJar (not yet)
Create and chain subjobs (not yet)

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
.idea		.idea
sparkle-catalog		sparkle-catalog
sparkle-core		sparkle-core
sparkle-dependencies		sparkle-dependencies
sparkle-metrics		sparkle-metrics
sparkle-project		sparkle-project
sparkle-sample		sparkle-sample
sparkle-test		sparkle-test
.gitignore		.gitignore
.travis.yml		.travis.yml
LICENSE		LICENSE
README.md		README.md
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sparkle

Exemple

Run on java command line

Run on Cluster

Features :

About

Releases

Packages

Languages

License

stephanetrou/sparkle

Folders and files

Latest commit

History

Repository files navigation

Sparkle

Exemple

Run on java command line

Run on Cluster

Features :

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages