Skip to content

Project that explores the Spark parallelization of ROOT analysis

Notifications You must be signed in to change notification settings

etejedor/root-spark

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 

Repository files navigation

PyROOT Parallelization with Spark

Project that explores the Spark parallelization of ROOT analysis, in particular using the ROOT Python interface (PyROOT).

The parallelisation strategy applies the map-reduce pattern to the processing of a ROOT TTree. In the map phase, each mapper reads and processes a sub-range of TTree entries and produces a partial result, while the reduce phase combines all the partial outputs into a final result (i.e. a set of filled histograms).

In the programming model, based on the Python language, the user creates a DistTree object from a list of files containing a TTree and the TTree name. Moreover, the number of partitions (sub-ranges) of the TTree can also be specified. In order to start the parallel processing, the user invokes the ProcessAndMerge function on the DistTree. The parameters of this function are the mapper and reducer functions. The mapper receives a TTreeReader, a ROOT object that represents a sub-range of entries and that can be iterated on.

This code snippet gives an example of how the DistTree class can be used:

# ROOT imports
import ROOT
from DistROOT import DistTree

# Build the DistTree
dTree = DistTree(filelist = ["myFile1", "myFile2"],
                 treename = "myTree",
                 npartitions = 8)

# Trigger the parallel processing
myHistos = dTree.ProcessAndMerge(fillHistos, mergeHistos)

Authors:

About

Project that explores the Spark parallelization of ROOT analysis

Resources

Stars

Watchers

Forks

Packages

No packages published

Contributors 3

  •  
  •  
  •