Moving to ND #5

panManfredini · 2018-06-17T12:53:12Z

Solving many problems:

Modify the input classes to be able to play with THND. Classes to be modiefied are:

DataHandler
PdfComponent
pdfLikelihood

Addidng smart parameter scan at the beginning:

Xephyr is slow because it interpolates every time minuit ask for new parameter. Make a pre-interpolation table at the beginning and use always that instead.

jmosbacher · 2020-01-17T09:12:56Z

Reviving this discussion.
Just switching the models to 3 dimensional models was pretty straight forward. Main modification was to smart integration function to get consistent integration when integrating between different points within a bin. The rest was just adding another dimension to the for loops etc., the core functionality seems to work. should be done soon also updating the less commonly used functions that will require projections etc.
The question is if at this point we should do a bit of extra work and go straight to N dimensions.
Does anyone have any thoughts on this?

panManfredini · 2020-01-17T10:16:55Z

Interesting... Quoting the Root page THn:

Multidimensional histogram.

Use a THn if you really, really have to store more than three dimensions, and if a large fraction of all bins are filled. Better alternatives are

- THnSparse if a fraction of all bins are filled
- TTree

The major problem of THn is the memory use caused by n-dimensional histogramming: a THnD with 8 dimensions and 100 bins per dimension needs more than 2.5GB of RAM!

This THnSparse seems nice we could try to test it:

Efficiency

TH1 and TH2 are generally faster than THnSparse for one and two dimensional distributions. THnSparse becomes competitive for a sparsely filled TH3 with large numbers of bins per dimension. The tutorial sparsehist.C shows the turning point. On a AMD64 with 8GB memory, THnSparse "wins" starting with a TH3 with 30 bins per dimension. Using a THnSparse for a one-dimensional histogram is only reasonable if it has a huge number of bins.

Other suggested option TTree

This seems a lot of work since we would need to reproduce on TTree many of the histo functionality.

However:

3D seems already a lot! An analysis that goes for more than that MUST really know what they are doing, need to model the bkg perfectly in all dimensions, SYS are going to be crazy, etc... So I would advice any analysis against it... But maybe I'm not visionary enough.

Also we could check out how ND is done Python, maybe there is some smart memory management.

jmosbacher · 2020-01-17T14:27:30Z

3D seems already a lot!

Indeed, the only advantage I see in going to N dimensions is that we can put all the data in a single histogram instead of an array of histograms for the templates.

jmosbacher · 2020-01-17T14:32:44Z

Another trick we could use for efficiency is to only load the bins that have events into memory using a THnSparse. Right now the entire grid of template pdfs are loaded into memory.

panManfredini assigned hagarlandsman, ranitay and jmosbacher Jun 17, 2018

tunnell unassigned ranitay Oct 11, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Moving to ND #5

Moving to ND #5

panManfredini commented Jun 17, 2018

jmosbacher commented Jan 17, 2020

panManfredini commented Jan 17, 2020 •

edited

Loading

jmosbacher commented Jan 17, 2020 •

edited

Loading

jmosbacher commented Jan 17, 2020 •

edited

Loading

Moving to ND #5

Moving to ND #5

Comments

panManfredini commented Jun 17, 2018

jmosbacher commented Jan 17, 2020

panManfredini commented Jan 17, 2020 • edited Loading

Multidimensional histogram.

Efficiency

Other suggested option TTree

However:

jmosbacher commented Jan 17, 2020 • edited Loading

jmosbacher commented Jan 17, 2020 • edited Loading

panManfredini commented Jan 17, 2020 •

edited

Loading

jmosbacher commented Jan 17, 2020 •

edited

Loading

jmosbacher commented Jan 17, 2020 •

edited

Loading