Commits

Public github repositories commits statistics and anomalous days detections. (using Spark/Scala)

Prerequisites:

spark 2.2.0 and above.
sbt (scala build tool)

Instructions:

git clone [this repository]
sbt clean pacakge
spark-submit --files [commits.csv] [sbt-fullpath-output.jar]

Clarifications & Basic Pipeline:

in spite of the fact that this assignment could be solved in more concise way, I chose the type-safe way with scala Dataset and case classes to demonstrate typesafe workspace.

Commits.csv (input-file):

in a real cluster environmet (standalone, Mesos or Yarn) in order to boost performance file(s) should be located under a distributed file system (HDFS/s3/etc)

you can find raw data under: https://drive.google.com/open?id=1dsLhVGFA1n-_Yl5xd-NjhZyGB_MqIxwZ

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
src/main/scala/com/commits		src/main/scala/com/commits
.gitignore		.gitignore
README.md		README.md
build.sbt		build.sbt
results.txt		results.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Commits

Prerequisites:

Instructions:

Clarifications & Basic Pipeline:

Commits.csv (input-file):

About

Releases

Packages

Languages

pazinio/Commits

Folders and files

Latest commit

History

Repository files navigation

Commits

Prerequisites:

Instructions:

Clarifications & Basic Pipeline:

Commits.csv (input-file):

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages