Skip to content

dfeddema/sparkperftests

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Spark Mllib Performance Tests for OpenShift

This is based from the databricks spark-perf test harness

It runs on Openshift and uses pbench to collect telemetry data for.

It assumes you have already build your spark cluster in OpenShift.

peformance analyses of the entire cluster while each benchmark is running.

Pbench archives the performance data in elastic search

IMPORTANT: Pbench agents must be installed on all nodes in the spark cluster before running these tests

This test harness must be run by 'root' because pbench requires root privileges

The following machine learning algorithms are run with the config settings in config/config.py

  • Machine Learning
    • glm-regression: Generalized Linear Regression Model
    • glm-classification: Generalized Linear Classification Model
    • naive-bayes: Naive Bayes
    • als: Alternating Least Squares
    • kmeans: K-Means clustering
    • pearson: Pearson's Correlation
    • spearman: Spearman's Correlation

About

automated spark perf tests for openshift

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published