Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Moving to ND #5

Open
4 tasks
panManfredini opened this issue Jun 17, 2018 · 4 comments
Open
4 tasks

Moving to ND #5

panManfredini opened this issue Jun 17, 2018 · 4 comments
Assignees

Comments

@panManfredini
Copy link
Contributor

Solving many problems:

Modify the input classes to be able to play with THND. Classes to be modiefied are:

  • DataHandler
  • PdfComponent
  • pdfLikelihood

Addidng smart parameter scan at the beginning:

  • Xephyr is slow because it interpolates every time minuit ask for new parameter. Make a pre-interpolation table at the beginning and use always that instead.
@jmosbacher
Copy link
Collaborator

Reviving this discussion.
Just switching the models to 3 dimensional models was pretty straight forward. Main modification was to smart integration function to get consistent integration when integrating between different points within a bin. The rest was just adding another dimension to the for loops etc., the core functionality seems to work. should be done soon also updating the less commonly used functions that will require projections etc.
The question is if at this point we should do a bit of extra work and go straight to N dimensions.
Does anyone have any thoughts on this?

@panManfredini
Copy link
Contributor Author

panManfredini commented Jan 17, 2020

Interesting... Quoting the Root page THn:

Multidimensional histogram.

Use a THn if you really, really have to store more than three dimensions, and if a large fraction of all bins are filled. Better alternatives are

- THnSparse if a fraction of all bins are filled
- TTree

The major problem of THn is the memory use caused by n-dimensional histogramming: a THnD with 8 dimensions and 100 bins per dimension needs more than 2.5GB of RAM!

This THnSparse seems nice we could try to test it:

Efficiency

TH1 and TH2 are generally faster than THnSparse for one and two dimensional distributions. THnSparse becomes competitive for a sparsely filled TH3 with large numbers of bins per dimension. The tutorial sparsehist.C shows the turning point. On a AMD64 with 8GB memory, THnSparse "wins" starting with a TH3 with 30 bins per dimension. Using a THnSparse for a one-dimensional histogram is only reasonable if it has a huge number of bins.

Other suggested option TTree

This seems a lot of work since we would need to reproduce on TTree many of the histo functionality.

However:

3D seems already a lot! An analysis that goes for more than that MUST really know what they are doing, need to model the bkg perfectly in all dimensions, SYS are going to be crazy, etc... So I would advice any analysis against it... But maybe I'm not visionary enough.

Also we could check out how ND is done Python, maybe there is some smart memory management.

@jmosbacher
Copy link
Collaborator

jmosbacher commented Jan 17, 2020

3D seems already a lot!

Indeed, the only advantage I see in going to N dimensions is that we can put all the data in a single histogram instead of an array of histograms for the templates.

@jmosbacher
Copy link
Collaborator

jmosbacher commented Jan 17, 2020

Another trick we could use for efficiency is to only load the bins that have events into memory using a THnSparse. Right now the entire grid of template pdfs are loaded into memory.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants