Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update bin-wise-stats.md #929

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

pfackeldey
Copy link

Dear combine experts,

this PR updates the description of the autoMCstats algorithm. Two descriptions are (likely) more correctly described now regarding the case where the $n_{tot}^{eff}$ is below the threshold (Poisson constrained case). Can you confirm that the description algorithm is correct now?

Best, Peter

@pfackeldey
Copy link
Author

Dear combine experts,

as far as I understand the following code block defines the algorithm of the autoMCstats algorithm for the Poisson case: https://github.com/cms-analysis/HiggsAnalysis-CombinedLimit/blob/main/src/CMSHistErrorPropagator.cc#L363-L421

From my understanding this does not align with the description in the documentation: https://cms-analysis.github.io/HiggsAnalysis-CombinedLimit/latest/part2/bin-wise-stats/#description-of-the-algorithm

Can you clarify how the algorithm is implemented for the case where $n_{tot}^{eff} < \mathrm{threshold}$ (Poisson case)?

Best, Peter

@ajgilbert
Copy link
Collaborator

Hi Peter, I think the description aligns with the code. Below the Poisson threshold for the sum of processes we do Poisson when the individual process is below the same threshold (this part), otherwise Gaussian (this part). There is also one subtle case not described in the docs, when the per-process error is larger than the bin contents, we cannot form a Poisson uncertainty even if we wanted to, so we put a Gaussian instead (this part).

@pfackeldey
Copy link
Author

Hi @ajgilbert,

thank you very much for your fast reply.
I think I am still confused by the outer if condition: https://github.com/cms-analysis/HiggsAnalysis-CombinedLimit/blob/main/src/CMSHistErrorPropagator.cc#L350. Isn't this the condition that decides if we are in the Poisson or Gauss case?

Oooh I think I got it... in case $n_{tot}^{eff} < \mathrm{threshold}$ we are in the Poisson case. But in the Poisson case one additionally checks if the bin count of each individual process ($i$) is also below this threshold: $n_{i}^{eff} = n_{i}^2 / e_{i}^2 < \mathrm{threshold}$. If yes: apply Poisson, if not: apply Gaussian.

Is my understanding now correct?

Best, Peter

@ajgilbert
Copy link
Collaborator

Yes, exactly that :-) The reason is that Gaussian pdfs are faster to evaluate than Poissons, so we prefer to use them when we can.

@pfackeldey
Copy link
Author

Thank you so much for your explanation @ajgilbert !

I still think that one sentence needs a revision in the documentation (last point 7):

- The Poisson-constrained parameters are expressed as a yield multiplier with nominal value one: $n_{tot}\cdot v$.
+ The Poisson-constrained parameters are expressed as a yield multiplier with nominal value one: $n_{i} \cdot v$.

Since the Poisson parameters should act on each process individually, don't they?

If you don't mind I would go ahead and update this PR with:

  • the small patch I just wrote above
  • the one missing piece where "... the per-process error is larger than the bin contents, we cannot form a Poisson uncertainty even if we wanted to, so we put a Gaussian instead (this part)."
  • and I try to make it a bit clearer that in the Poisson case one checks again for the threshold for each process (my last comment)

Is this alright with you?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants