GitHub - neurospin/pylearn-epac: Embarrassingly Parallel Array Computing: EPAC is a machine learning workflow builder.

epac

Embarrassingly Parallel Array Computing: EPAC is a machine learning workflow builder.

Given a database:

    from sklearn import datasets
    X, y = datasets.make_classification(n_samples=12,
                                        n_features=10,
                                        n_informative=2,
                                        random_state=1)

You can build a big machine workflow:


    Permutation (Perm) + Cross-validation (CV) of SVM(linear) and SVM(rbf)
    ----------------------------------------------------------------------
             Perms          Perm (Splitter)
        /     |       \
       0      1       2     Samples
              |
              CV            CV (Splitter)
          /   |   \
         0    1    2        Folds
              |
           Methods          Methods (Splitter)
       /           \
    SVM(linear)  SVM(rbf)   Classifiers (Estimator)

using very simple codes:

    from sklearn.svm import SVC
    from epac import Perms, CV, Methods
    perms_cv_svm = Perms(CV(
                     Methods(*[SVC(kernel="linear"), SVC(kernel="rbf")]),
                     n_folds=3),
                     n_perms=3)
    perms_cv_svm.run(X=X, y=y) # Top-down process: computing recognition rates, etc.
    perms_cv_svm.reduce() # Bottom-up process: computing p-values, etc.

Then you can get results like:

ResultSet(
[{'key': SVC(kernel=linear), 'y/test/score_f1': [ 0.5  0.5], 'y/test/score_recall_mean/pval': [ 0.5], 'y/test/score_recall/pval': [ 0.5  0.5], 'y/test/score_accuracy/pval': [ 0.5], 'y/test/score_f1/pval': [ 0.5  0.5], 'y/test/score_precision/pval': [ 0.5  0.5], 'y/test/score_precision': [ 0.5  0.5], 'y/test/score_recall': [ 0.5  0.5], 'y/test/score_accuracy': 0.5, 'y/test/score_recall_mean': 0.5},
 {'key': SVC(kernel=rbf), 'y/test/score_f1': [ 0.5  0.5], 'y/test/score_recall_mean/pval': [ 1.], 'y/test/score_recall/pval': [ 0.  1.], 'y/test/score_accuracy/pval': [ 1.], 'y/test/score_f1/pval': [ 1.  1.], 'y/test/score_precision/pval': [ 1.  1.], 'y/test/score_precision': [ 0.5  0.5], 'y/test/score_recall': [ 0.5  0.5], 'y/test/score_accuracy': 0.5, 'y/test/score_recall_mean': 0.5}])

Run epac tree in parallel on local multi-core machine or even on DRM system using soma-workflow.

    from epac import LocalEngine
    local_engine = LocalEngine(tree_root=perms_cv_svm, num_processes=2)
    perms_cv_svm = local_engine.run(X=X, y=y)
    perms_cv_svm.reduce()

Design your own machine learning algorithm as a plug-in node in epac tree.

    ## 1) Design your classifier
    ## =========================
    class MySVC:
        def __init__(self, C=1.0):
            self.C = C
        def transform(self, X, y):
            from sklearn.svm import SVC 
            svc = SVC(C=self.C)
            svc.fit(X, y)
            # "transform" should return a dictionary
            return {"y/pred": svc.predict(X), "y": y}

    ## 2) Design your reducer for recall rates
    ## ===========================================
    from epac.map_reduce.reducers import Reducer  
    class MyReducer(Reducer):
        def reduce(self, result):
            from sklearn.metrics import precision_recall_fscore_support 
            pred_list = []
            # iterate all the results of each classifier
            # then you can design you own reducer!
            for res in result:
                precision, recall, f1_score, support = \
                        precision_recall_fscore_support(res['y'], res['y/pred'])
                pred_list.append({res['key']: recall})
            return pred_list

    ## 3) Build a tree, and then compute results 
    ## =========================================
    from epac import Methods 
    my_svc1 = MySVC(C=1.0)
    my_svc2 = MySVC(C=2.0)
    two_svc = Methods(my_svc1, my_svc2)
    two_svc.reducer = MyReducer()
    #           Methods
    #          /      \
    # MySVC(C=1.0)  MySVC(C=2.0) 
    # top-down process to call transform
    two_svc.run(X=X, y=y)
    # buttom-up process to compute scores
    two_svc.reduce()

You can get results: [{'MySVC(C=1.0)': array([ 1., 1.])}, {'MySVC(C=2.0)': array([ 1., 1.])}]

Important links

Installation http://neurospin.github.io/pylearn-epac/installation.html

Tutorials http://neurospin.github.io/pylearn-epac/tutorials.html

Documentation http://neurospin.github.io/pylearn-epac

Presentation Embarrassingly Parallel Array Computing

Name		Name	Last commit message	Last commit date
Latest commit History 560 Commits
bin		bin
doc		doc
epac		epac
examples		examples
test		test
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
ez_setup.py		ez_setup.py
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

epac

Important links

About

Releases

Packages

Languages

License

neurospin/pylearn-epac

Folders and files

Latest commit

History

Repository files navigation

epac

Important links

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages