octopus

A framework for job distribution and data replication on large clusters of machines, based off Apache Spark, taking advantage of Spark's resilience and simple framework for task distribution.

import OctopusContext._
val sc: SparkContext = ???
val oc = sc.getOctopusContext()

Octopus' abstraction for data replication is the DataSet class, which represents data replicated on all machines in the cluster. Unlike Spark RDD's, DataSets do not partition their data accross the cluster, but simply create a copy on each machine. DataSets can be acted on by lazy transformations (map, filter, flatMap, reduceByKey, groupBy, drop, take ...) which are not executed until the data is needed for jobs.

val data: DataSet[Int] = oc.deploy(1 to 1000)
val transformed: DataSet[(Int,Int)] = data.map(_ + 1).filter(_ % 2 == 0)
  .flatMap(i => 1 to i) //this will increase size a lot, better to do it on the workers 
  .map(i => (i, i + 1))
  .reduceByKey(_ * _) //transformations not computed yet - data not even deployed yet

def job(i: Int)(data: Iterable[(Int,Int)]) = data.map(_._2 + i).sum

val jobs = (1 to 10).map(i => job(i) _).toSeq
val results: Seq[Int] = transformed.execute(jobs)

Octopus supports caching of the replicated data, for ulterior reuse.

val cached = data.cache() //will store the data when a job is performed on it
cached.unpersist() //will remove the data from the cache when any job is performed

Jobs may also be launched without a DataSet, by using the OctopusContext :

def job(i: Int)(): Int = (1 to 1000).map(_ + i).sum
val jobs = (1 to 1000) map(i => job(i) _).toList
val results: List[Int] = oc.runJobs(jobs)

It doesn't matter which kind of collection you provide for your jobs (List, Seq, ...) as long as it's a GenSeqLike ; Octopus will return the results with the same collection type.

You can also launch jobs asynchronously ! Results will be wrapped in a Future

val results: Future[Seq[Int]] = data.submit(jobs)

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
octopus/src		octopus/src
.travis.yml		.travis.yml
README.md		README.md
octopus-header.png		octopus-header.png
octopus-logo.png		octopus-logo.png
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

octopus

About

Releases

Packages

Languages

odusseys/octopus

Folders and files

Latest commit

History

Repository files navigation

octopus

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages