Embarrassingly Parallel Array Computing: EPAC is a machine learning workflow builder.
Given a database:
from sklearn import datasets
X, y = datasets.make_classification(n_samples=12,
n_features=10,
n_informative=2,
random_state=1)
- You can build a big machine workflow:
Permutation (Perm) + Cross-validation (CV) of SVM(linear) and SVM(rbf)
----------------------------------------------------------------------
Perms Perm (Splitter)
/ | \
0 1 2 Samples
|
CV CV (Splitter)
/ | \
0 1 2 Folds
|
Methods Methods (Splitter)
/ \
SVM(linear) SVM(rbf) Classifiers (Estimator)
using very simple codes:
from sklearn.svm import SVC
from epac import Perms, CV, Methods
perms_cv_svm = Perms(CV(
Methods(*[SVC(kernel="linear"), SVC(kernel="rbf")]),
n_folds=3),
n_perms=3)
perms_cv_svm.run(X=X, y=y) # Top-down process: computing recognition rates, etc.
perms_cv_svm.reduce() # Bottom-up process: computing p-values, etc.
Then you can get results like:
ResultSet(
[{'key': SVC(kernel=linear), 'y/test/score_f1': [ 0.5 0.5], 'y/test/score_recall_mean/pval': [ 0.5], 'y/test/score_recall/pval': [ 0.5 0.5], 'y/test/score_accuracy/pval': [ 0.5], 'y/test/score_f1/pval': [ 0.5 0.5], 'y/test/score_precision/pval': [ 0.5 0.5], 'y/test/score_precision': [ 0.5 0.5], 'y/test/score_recall': [ 0.5 0.5], 'y/test/score_accuracy': 0.5, 'y/test/score_recall_mean': 0.5},
{'key': SVC(kernel=rbf), 'y/test/score_f1': [ 0.5 0.5], 'y/test/score_recall_mean/pval': [ 1.], 'y/test/score_recall/pval': [ 0. 1.], 'y/test/score_accuracy/pval': [ 1.], 'y/test/score_f1/pval': [ 1. 1.], 'y/test/score_precision/pval': [ 1. 1.], 'y/test/score_precision': [ 0.5 0.5], 'y/test/score_recall': [ 0.5 0.5], 'y/test/score_accuracy': 0.5, 'y/test/score_recall_mean': 0.5}])
- Run epac tree in parallel on local multi-core machine or even on DRM system using soma-workflow.
from epac import LocalEngine
local_engine = LocalEngine(tree_root=perms_cv_svm, num_processes=2)
perms_cv_svm = local_engine.run(X=X, y=y)
perms_cv_svm.reduce()
- Design your own machine learning algorithm as a plug-in node in epac tree.
## 1) Design your classifier
## =========================
class MySVC:
def __init__(self, C=1.0):
self.C = C
def transform(self, X, y):
from sklearn.svm import SVC
svc = SVC(C=self.C)
svc.fit(X, y)
# "transform" should return a dictionary
return {"y/pred": svc.predict(X), "y": y}
## 2) Design your reducer for recall rates
## ===========================================
from epac.map_reduce.reducers import Reducer
class MyReducer(Reducer):
def reduce(self, result):
from sklearn.metrics import precision_recall_fscore_support
pred_list = []
# iterate all the results of each classifier
# then you can design you own reducer!
for res in result:
precision, recall, f1_score, support = \
precision_recall_fscore_support(res['y'], res['y/pred'])
pred_list.append({res['key']: recall})
return pred_list
## 3) Build a tree, and then compute results
## =========================================
from epac import Methods
my_svc1 = MySVC(C=1.0)
my_svc2 = MySVC(C=2.0)
two_svc = Methods(my_svc1, my_svc2)
two_svc.reducer = MyReducer()
# Methods
# / \
# MySVC(C=1.0) MySVC(C=2.0)
# top-down process to call transform
two_svc.run(X=X, y=y)
# buttom-up process to compute scores
two_svc.reduce()
You can get results: [{'MySVC(C=1.0)': array([ 1., 1.])}, {'MySVC(C=2.0)': array([ 1., 1.])}]
Installation http://neurospin.github.io/pylearn-epac/installation.html
Tutorials http://neurospin.github.io/pylearn-epac/tutorials.html
Documentation http://neurospin.github.io/pylearn-epac
Presentation Embarrassingly Parallel Array Computing