Skip to content

stephanetrou/sparkle

Repository files navigation

Build Status

Sparkle

Sparkle is a framework largely inspired by SpringBoot that help Data Engineer write Spark Job. He only use librairies provided by Spark.

Exemple

    public class SparkleSample extends Job {
  
        public static void main(String args[]) {
            SparkleApplication.run(SparkleSample.class, args);
        }
        
        public run() {
            read(getSource())
                .as(csvh())
                .createOrReplaceTempView("train");
            
            description("query", "show \uD83E\uDD84");
            Dataset<Row> ds = spark().sql("SELECT sum(age) as sum_age, avg(age) as avg_age FROM train where is_not_blank(Cabin) group by sex");
            
            overwrite(ds).as(parquet(getDestination()));
        }
    }

Run on java command line

 java sparkle-sample.jar SparkleSample -s /tmp/titanic/train.csv -d /tmp/destination.parquet --debug 

Run on Cluster

 spark-submit --class SparkleSample sparkle-sample.jar -s /tmp/titanic/train.csv -d /tmp/destination.parquet --debug

Features :

  • Simple CLI Configuration
  • Automatically register UDFs
  • Add listener to monitor your job
  • Add messages in Spark UI
  • Write Once Run Everywhere (Standalone, Cluster ...)
  • For Windows no more winutils error
  • Testing : Improve errors messages
  • Package FatJar (not yet)
  • Create and chain subjobs (not yet)

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published