-
Notifications
You must be signed in to change notification settings - Fork 393
usefullinks
There are several tutorials which have been run over the last few years with instructions and examples for running the combine tool.
-
Tutorial Sessions
- 1st tutorial 17th Nov 2015.
- 2nd tutorial 30th Nov 2016.
- 3rd tutorial 29th Nov 2017
-
4th tutorial 31st Oct 2018 - Latest for
81x-root606
branch.
-
Worked examples from Higgs analyses using combine
-
Conventions to be used when preparing inputs for Higgs combinations
-
[CMS AN-2011/298](http://cms.cern.ch/iCMS/jsp/db_notes/noteInfo.jsp?cmsnoteid=CMS AN-2011/298) Procedure for the LHC Higgs boson search combination in summer 2011. This describes in more detail some of the methods used in Combine.
There is no document currently which can be cited for using the combine tool, however you can use the following publications for the procedures we use,
-
Summer 2011 public ATLAS-CMS note for any Frequentist limit setting procedures with toys or Bayesian limits, constructing likelihoods, descriptions of nuisance parameter options (like log-normals (
lnN
) or gamma (gmN
), and for definitions of test-statistics. -
CCGV paper if you use any of the asymptotic (eg with
-M AsymptoticLimits
or-M Significance
approximations for limits/p-values. -
If you use the Barlow-Beeston approach to MC stat (bin-by-bin) uncertainties, please cite [their paper](Barlow-Beeston. You should also cite this note if you use the
autoMCStats
directive to produce a single parameter per bin. -
If you use
shape
uncertainties for template (TH1
orRooDataHist
) based datacards, you can cite this note from J. Conway. -
If you are extracting uncertainties from LH scans - i.e using
$$-2\Delta Log{L}=1$$ etc for the 1$$\sigma$$ intervals, you can cite either the ATLAS+CMS or CMS Higgs paper. -
There is also a long list of citation recommendations from the CMS Statistics Committee pages.
- Hypernews forum: hn-cms-higgs-combination https://hypernews.cern.ch/HyperNews/CMS/get/higgs-combination.html
- You can find much more statistics theory and reccomendations on various statistical procedures in the CMS Statistics Committee Twiki Pages
-
Why does combine have trouble with bins that have zero expected contents?
- If you're computing only upper limits, and your zero-prediction bins are all empty in data, then you can just set the background to a very small value instead of zero as anyway the computation is regular for background going to zero (e.g. a counting experiment with B<<1 will have essentially the same expected limit and observed limit as one with B=0). If you're computing anything else, e.g. p-values, or if your zero-prediction bins are not empty in data, you're out of luck, and you should find a way to get a reasonable background prediction there (and set an uncertainty on it, as per the point above)
-
How can an uncertainty be added to a zero quantity?
- You can put an uncertainty even on a zero event yield if you use a gamma distribution. That's in fact the more proper way of doing it if the prediction of zero comes from the limited size of your MC or data sample used to compute it.
-
Why does changing the observation in data affect my expected limit?
- The expected limit (if using either the default behaviour of
-M AsymptoticLimits
or using theLHC-limits
style limit setting with toys use the post-fit expectation of the background model to generate toys. This means that first the model is fit to the observed data before toy generation. See the sections on blind limits and toy generation to avoid this behavior.
- The expected limit (if using either the default behaviour of
-
How can I deal with an interference term which involves a negative contribution?
- You will need to set up a specific PhysicsModel to deal with this, however you can see this section to implement such a model which can incorperate a negative contribution to the physics process
-
How does combine work?
- That is not a question which can be answered without someone's head exploding so please try to formulate something specific.
-
What does fit status XYZ mean?
- Combine reports the fit status in some routines (for example in the
FitDiagnostics
method). These are typically the status of the last call from Minuit. For details on the meanings of these status codes see the Minuit2Minimizer documentation page.
- Combine reports the fit status in some routines (for example in the
-
Why does my fit not converge?
- There are several reasons why some fits may not converge. Often some indication can be obtained from the
RooFitResult
or status which you will see information from when using the--verbose X
(with X>2) option. Sometimes however, it can be that the likelihood for your data is very unusual. You can get a rough idea about what the likelihood looks like as a function of your parameters (POIs and nuisances) usingcombineTool.py -M FastScan -w myworkspace.root
(use --help for options).
- There are several reasons why some fits may not converge. Often some indication can be obtained from the
-
Why does the fit/fits take so long?
- The minimisation routines are common to many methods in combine. You can tune the fitting using the generic optimisation command line options described here. For example, setting the default minimizer strategy to 0 can greatly improve the speed since this avoids running Hesse. In calculations such as
AsymptoticLimits
, Hesse is not needed and hence this can be done, however, forFitDiagnostics
the uncertainties and correlations are part of the output so using strategy 0 may not be particularly accurate.
- The minimisation routines are common to many methods in combine. You can tune the fitting using the generic optimisation command line options described here. For example, setting the default minimizer strategy to 0 can greatly improve the speed since this avoids running Hesse. In calculations such as
-
Why are the results for my counting experiment so slow or unstable?
- There is a known issue with counting experiments with large numbers of events which will cause unstable fits or even the fit to fail. You can avoid this by creating a "fake" shape datacard (see this section from the setting up the datacards page). The simplest way to do this is to run
combineCards.py -S mycountingcard.txt > myshapecard.txt
. You may still find that your parameter uncertainties are not correct when you have large numbers of events. This can be often fixed using the--robustHesse
option. An example of this issue is detailed here.
- There is a known issue with counting experiments with large numbers of events which will cause unstable fits or even the fit to fail. You can avoid this by creating a "fake" shape datacard (see this section from the setting up the datacards page). The simplest way to do this is to run
-
Why do some of my nuisance parameters have uncertainties > 1?
- When running
-M FitDiagnostics
you may find that the post-fit uncertainties of the nuisances are > 1 (or larger than their pre-fit values). If this is the case, you should first check if the same is true when adding the option--minos all
which will invoke minos to scan the likelihood as a function of these parameters to determine the crossing at$$-2\times\Delta\log\mathcal{L}=1$$ rather than relying on the estimate from Hesse. However, this is not guaranteed to succeed, in which case you can scan the likelihood yourself usingMultiDimFit
( see here ) and specifying the option--poi X
whereX
is your nuisance parameter.
- When running
-
How can I avoid using the data?
- For almost all methods, you can use toy data (or an Asimov dataset) in place of the real data for your results to be blind. You should be careful however as in some methods, such as
-M AsymptoticLimits
or-M HybridNew --LHCmode LHC-limits
or any other method using the option--toysFrequentist
, the data will be used to determine the most likely nuisance parameter values (to determine the so-called a-posteriori expectation). See the section on toy data generation for details on this.
- For almost all methods, you can use toy data (or an Asimov dataset) in place of the real data for your results to be blind. You should be careful however as in some methods, such as
-
What if my nuisance parameters have correlations which are not 0 or 1?
- Combine is designed under the assumption that each source of nuisance parameter is uncorrelated with the other sources. If you have a case where some pair (or set) of nuisances have some known correlation structure, you can compute the eigenvectors of their correlation matrix and provide these diagonalised nuisances to combine. You can also model partial correlations, between different channels or data taking periods, of a given nuisance parameter using the
combineTool
as described in this page.
- Combine is designed under the assumption that each source of nuisance parameter is uncorrelated with the other sources. If you have a case where some pair (or set) of nuisances have some known correlation structure, you can compute the eigenvectors of their correlation matrix and provide these diagonalised nuisances to combine. You can also model partial correlations, between different channels or data taking periods, of a given nuisance parameter using the
-
My nuisances are (artificially) constrained and/or the impact plot show some strange behaviour, especially after including MC statistical uncertainties. What can I do?
- Depending on the details of the analysis, several solutions can be adopted to mitigate these effects. We advise to run the validation tool at first, to identify possible redundant shape uncertainties that can be safely eliminated or replaced with lnN ones. Any remaining artificial constrain should be studies. Possible mitigating strategies can be to (a) smooth the templates or (b) adopt some rebinning in order to reduce statistical fluctuations in the templates. A description of possible strategies and effects can be found in this talk by Margaret Eminizer
- Introduction
- Getting started
- Setting up the analysis
- Running Combine
- Useful Links
- FAQ