-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implementation of ATLAS_Z0J_8TEV PT-Y and PT-M in the new format #2169
Conversation
nnpdf_data/nnpdf_data/commondata/ATLAS_Z0J_8TEV/kinematics_legacy_PT-Y.yaml
Outdated
Show resolved
Hide resolved
Btw @achiefa, how are you generating (re-)generating the |
@Radonirinaunimi I'm not regenerating anything at the moment. I simply changed the name of the old kinematics to |
@@ -102,7 +107,7 @@ implemented_observables: | |||
description: Variable k3 | |||
label: k3 | |||
units: '' | |||
file: kinematics_PT-M.yaml | |||
file: kinematics_legacy_PT-M.yaml |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just for reference, we don't need to keep the legacy kinematics, you can just remove these files.
The information should be the same as in the new ones, and the new ones have all the info we need (name of the variables mainly)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you. However, I'd say that the information is not the same. The "old" kinematics used the mid value of the bin (squared), while the new one that I'm going to implement will contain min and max value of the bin (non-squared). If you say that it doesn't make any difference, and we can get rid of the old kinematics, I will remove the latter and implement the new kinematics as mentioned above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you can remove the old one. You can fill min, max and the midpoint will be filled automatically (and should be the same as before).
The reason only the midpoint was used before is that validphys could not deal with that extra information, but having the min-max is much better because then we can use these datasets to autogenerate runcards (for instance the NNLOJET runcards need this information)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, thank you so much. This is indeed what I was trying to understand. I go on.
Apparently, the tables in hepdata do not provide a full breakdown of the systematic uncertainties. Thus, I don't know how to reproduce the breakdown of systematic uncertainties reported in the legacy file. Should I only use the diagonal entries in the rawdata tables? |
I fear this will be a recurring theme the more datasets are re-implemented, so decisions have to taken with @enocera on dataset-by-dataset basis (?). I would point out however that, usually (as is the case for this particular dataset), HepData includes the raw data in which the break-down of the systematics are given if one download the full thing with the resource files. (So to say that even the distinction between HepData and non-HepData is ambiguous). |
Yes, I see the break-down in the resource files. But it's not really clear to me how to use this information. |
Ok, I think I sort of understood how to use the resource files to retrieve the old break-down of the uncertainties. So, what do you want me to do? |
@achiefa My suggestion is to implement the full breakdown of uncertainties, whenever this can be retrieved from HepData. This is the case, even if this amounts to look into the "Resources". |
I've implemented the dataset with the full break-down of the systematic uncertainties. However, there are a few things that are worth mentioning:
One last thing - Is there a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this @achiefa! I will need to look in detail into the mismatch in PT-M
, however, here are some quick comments.
Also, whenever possible (in case the datapoints and everything correspond to the old implementation, as is the case for PT-Y
), could you make sure that the covmats are the same?
from validphys.api import API
import numpy as np
inp1 = {"dataset_input": {"dataset": f"{new_implementation}"}, "theoryid": 40_000_000, "use_cuts": "internal"}
inp2 = {"dataset_input": {"dataset": f"{old_implementation}", "variant": "legacy"}, "theoryid": 40_000_000, "use_cuts": "internal"}
covmat1 = API.covmat_from_systematics(**inp1)
covmat2 = API.covmat_from_systematics(**inp2)
np.isclose(covmat1, covmat2)
This not only makes sure that the implementations are consistent but most importantly that the datasets could be loaded properly.
nnpdf_data/nnpdf_data/commondata/ATLAS_Z0J_8TEV/filter_utils.py
Outdated
Show resolved
Hide resolved
nnpdf_data/nnpdf_data/commondata/ATLAS_Z0J_8TEV/filter_utils.py
Outdated
Show resolved
Hide resolved
If the number of datapoints are different, then the re-implementation in the new format becomes a new dataset. Please refer around this comment for more context.
For this, let's simply use the rawdata, even for the binned/centra values. |
Nice! Could you also check that the t0 covmat is the same, to be completely sure?
This is tricky indeed :___ Which points have changed? If they are points that we don't want to include maybe it makes sense to remove them directly in the filter.py file (for instance, if they are points for pT < 30 GeV we can safely discard them and then later on if we need it add a atlas_zoj_lowpt dataset) |
I tried to use
|
The t0 needs also a |
Ok, the t0 matrices are the same as well.
That I didn't check. But I have a feeling that the new implementation adds a range in the mass of the lepton pair, but using the same bins in pT. I'll check that. |
I have investigated the discrepancy in the
The legacy version doesn't have the second-to-last kin region in I don't know if this tiny difference in the last kin region is a problem for the grids. If not, we could drop the second-to-last bin and use the same grids as in the legacy implementation. |
Dear @achiefa (cc @scarlehoff), I am a little confused about the description of this data set. There must indeed be 6 bins in the invariant mass of the final state, as @achiefa correctly mentions above. All of these are then single-differential in pT (and not in rapidity, as @achiefa seems to suggest above). The bin corresponding to the Z-mass peak ( |
Hi @enocera, I apologise. I meant to say pT and not rapidity, as you pointed out. Note that the invariant mass bins are in the kinematic region |
@achiefa as you can see from Hepdata |
Ok, now I understand. Thank you so much. |
6afef03
to
afd34b6
Compare
The FK table for the last range in the invariant mass is |
600e0d9
to
e082411
Compare
Yes, please, rebase on top of master so that you can use the nice bot that @Radonirinaunimi added to regenerate the data. And ensure that you are using the yaml prettifier so that things like 6.000000000001 are rounded |
I did, but I think there's a problem with the EICC rawdata (look at the bot report)
How can I do that? |
If the EIC data is breaking the tests (and were not broken before) it must be something that has been changed in this branch. Also the normal tests are also broken. RE the prettifier, look e.g. at the filter.py of the jet data, basically adding a function for the yaml parser. |
I fixed the test for the commondata, but |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Try to run the test locally to find out what broke the test.
Btw, culd you also remove the jupyter notebook for the filter?
It seems to be
The order of magnitude for the tolerance is fine though. |
Ok, I figured out the problem. It is due to the round of the digits in the HepData table:
Indeed, I tried to use the legacy variant for the central data, and the tests succeeded. The legacy data points can be gathered from the source files. What do you think I should do? @scarlehoff |
For the chi2 that's good. For the tests, just regenerate the regression test with the new data if that's the only difference. Btw, could you remove from the rawdata all tables that are not being used by the filter (if any) |
69690b8
to
3c0f9fb
Compare
Hi @scarlehoff, the PR is ready to be reviewed now. Note that all tests passed in the previous commit, but now they fail because of a Gateway error:
I suspect that is not up to me because, as I said, all tests passed last week, and the last commit just corrected a nan value in one of the uncertainties variants. |
When there is only one test failing, it might be due to an internet problem. Just resubmit the test (I just did). |
Ok, all tests passed! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this. Just a final comment, please remove all the files from hepdata that are not explicitly used in the fitler, otherwise the size of the repository will grow a lot without need.
nnpdf_data/nnpdf_data/commondata/ATLAS_Z0J_8TEV/rawdata/hepdata/F2AV_README
Outdated
Show resolved
Hide resolved
Ok, green light from me. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great! Thanks
85a7fa4
to
e0e4183
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, one last minor comment before merging. Could you remove the kinematics_legacy? since those are not needed anymore
Thanks!
Yes, I can. Do you want me also to remove the data legacy? |
Yes, please, since it is not being loaded anymore. |
Should these commits be squashed into a single one? In particular I'm looking at f9bcdc9 and it would be nice not to have in the history in master. |
e2732cc
to
9fd163b
Compare
Good to go! |
First commit Kin and central data Dataset implemented Correct uncertainty definitions + pre-commit Add mid value of the bins Change name kinematics Minor adjustments Remove Table 39 from PT-M distribution Remove sqrts from kinematic_coverage Include m_ll in process options Add DY_PTRAP process Correct nnpdf31 process Add docstring, correct process_option for PT-Y, use m_ll2 Restoring EICC files Add yaml prettifier Remove Remove jupyter nb Add units to PT-M dist Add tolerance to test against legacy Remove unused tables Regenerate fits for tests in validphys Remove from config card for test Add variant with Monte Carlo uncertainties Correct naming error Remove last unused files Correct nan value for mc uncertainties Add eta to process Add module docstring Remove unused files Remove kinematics legacy Remove legacy data Remove tests against legacy data
9fd163b
to
7d79286
Compare
I've recommited the squash to remove the merge commits, let's wait for the tests then merge. Thanks for this! |
Report showing$(x,Q^2)$ map and theory-data comparisons for legacy implementation: legacy report (for reference)
The same report, but for the new implementation: new report