Sparkling is a Clojure API for Apache Spark.
Check out our site for information about Gorillalabs Sparkling and a getting started guide.
Sparkling is available from Clojars. To use with Leiningen, add
See gorillalabs/sparkling-getting-started for an example project using Sparkling. This one is also used in the getting started guide
- Use sparkling.core instead of sparkling.api for parameter orders similar to Clojure. Easier currying using partial.
- Made it possible to use Keywords as Functions by serializing IFn instead of AFunction.
- Tested with Spark 1.1.0 and Spark 1.2.1.
- It's about twice as fast by getting rid of a reflection call (thanks to David Jacot for his take on this).
- Get rid of mapping/remapping inside the api functions, which
- bloated the execution plan (mine shrinked to a third) and
- (more importantly) allowed me to keep partitioner information.
- adding more -values functions (e.g. map-values), againt to keep partitioner information.
- Additional Sources for RDDs:
- JdbcRDD: Reading Data from your JDBC source.
- Hadoop-Avro-Reader: Reading AVRO Files from HDFS
Thanks to The Climate Corporation and their open source clj-spark project, and to Yieldbot for yieldbot/flambo which served as the starting point for this project.
Copyright (C) 2014-2015 Dr. Christian Betz, and the Gorillalabs team.
Distributed under the Eclipse Public License either version 1.0 or (at your option) any later version.