Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add datatypes for storing generator (meta)data in a structured and defined way #310

Merged
merged 36 commits into from
Jun 18, 2024

Conversation

hegner
Copy link
Contributor

@hegner hegner commented May 22, 2024

BEGINRELEASENOTES

  • Add GeneratorEventParameters and GeneratorPdfInfo datatypes to store generator related data in a structured and well defined way.
  • Add a GeneratorToolInfo struct and related utility functionality to store some high level metadata on the generator into Frame parameters.

ENDRELEASENOTES

Pull request to collect all the changes required for #309

edm4hep.yaml Outdated Show resolved Hide resolved
Copy link
Contributor

@tmadlener tmadlener left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think overall this looks like a reasonable structure (at least to me). But @dirkzerwas should probably have a look to make sure we don't miss anything obvious yet.

Can you also add tests to read back the things that you currently write. At least the ToolInfo parts have enough additional logic wrt podio to warrant a test, I think.

include/edm4hep/GenToolInfo.h Outdated Show resolved Hide resolved
include/edm4hep/GenToolInfo.h Outdated Show resolved Hide resolved
include/edm4hep/GenToolInfo.h Outdated Show resolved Hide resolved
hegner and others added 2 commits May 27, 2024 16:59
@dirkzerwas
Copy link

Hi @hegner and @tmadlener,

looks very nice, a couple of thoughts:

  • just wondering if adding ''Event" into GeneratorParameters might make sense to make clear it's the "per event" structure and not the general setting of the generator (like the N PYTHIA switches)
  • I may have missed it: HepMC has a vector of weightnames in the run_info part, suggest to foresee that in "run" (even though this is not filled by all generators consistently at the moment)
  • weights: do you wan to leave the weights in the EventHeader? In principle one could argue that weights are linked with the cross section @apricePhy told me that the use is not the some in all generators. So to be safe, one may want to keep the two together?

Cheers,
Dirk

edm4hep.yaml Outdated Show resolved Hide resolved
hegner and others added 3 commits May 27, 2024 18:41
Co-authored-by: Andre Sailer <[email protected]>
Co-authored-by: Juan Miguel Carceller <[email protected]>
@hegner
Copy link
Contributor Author

hegner commented May 27, 2024

  • just wondering if adding ''Event" into GeneratorParameters might make sense to make clear it's the "per event" structure and not the general setting of the generator (like the N PYTHIA switches)

Yes. I think that's a fair point. I will do that rename.

  • I may have missed it: HepMC has a vector of weightnames in the run_info part, suggest to foresee that in "run" (even though this is not filled by all generators consistently at the moment)

For this we do not need a new data type at all. My idea was to just have an example for how to do it in a to-be-written doc page. With possibly a constant defined for the name of the weight names.

  • weights: do you wan to leave the weights in the EventHeader? In principle one could argue that weights are linked with the cross section @apricePhy told me that the use is not the some in all generators. So to be safe, one may want to keep the two together?

I left it in one piece for simplicity. No problem to split it in two.

@hegner
Copy link
Contributor Author

hegner commented May 28, 2024

@dirkzerwas - I have a question for clarification. We have event weights stored in our standard edm4hep::EventHeader and I would not like to change its definition. So we could either leave the cross sections in the GeneratorEventParameters or make it an independent type. What do you think?

@dirkzerwas
Copy link

@dirkzerwas - I have a question for clarification. We have event weights stored in our standard edm4hep::EventHeader and I would not like to change its definition. So we could either leave the cross sections in the GeneratorEventParameters or make it an independent type. What do you think?

If downstream algs already access the weights, I understand the difficulty, it was more esthetics than "need" :)

  • the comment came initially at a conversation with @tmadlener (when discussing weights in general a couple of weeks ago), that by adding "weights" to EventHeader this mixes MC and "in data available" information, correct Thomas?
  • I would prefer to keep the crossSections where you have put them and not extract them, the structure is clearer

However I would move the signal_vertex (oneToManyRelation) out of the PDF structure if possible?

@hegner
Copy link
Contributor Author

hegner commented May 28, 2024

However I would move the signal_vertex (oneToManyRelation) out of the PDF structure if possible?

@dirkzerwas - no problem. It would then be a standalone collection. I was hoping to keep the number of collections small however. If that's your only comment, I will implement that and finish this feature with writing a simple example.

@tmadlener
Copy link
Contributor

the comment came initially at a conversation with @tmadlener (when discussing weights in general a couple of weeks ago), that by adding "weights" to EventHeader this mixes MC and "in data available" information, correct Thomas?

Yes, exactly. In principle an EventHeader should only only contain things that are available on both simulation and real data. From a "historical point of view" we can only store multiple weights in there since #254, so there is not too much historical baggage, and this has not yet been part of a released version of EDM4hep. We will probably not get rid of the single weight in the EventHeader as easily, as that has been there since the beginning. So we could in principle move the weights vector closer to the cross sections without boo much breakage.

@dirkzerwas
Copy link

@hegner
I would move the
OneToManyRelations:
edm4hep::MCParticle signal_vertex // pointing into MCParticle collection
to GeneratorEventParameters which already has event_scale etc
but maybe I misunderstood your comment?

@hegner hegner changed the title [WIP] add GeneratorInformation definition Add GeneratorInformation definition Jun 4, 2024
tmadlener
tmadlener previously approved these changes Jun 10, 2024
@tmadlener tmadlener changed the title Add GeneratorInformation definition Add datatypes for storing generator (meta)data in a structured and defined way Jun 10, 2024
@tmadlener tmadlener dismissed their stale review June 10, 2024 14:58

some more minor changes

include/edm4hep/GenToolInfo.h Outdated Show resolved Hide resolved
include/edm4hep/GenToolInfo.h Outdated Show resolved Hide resolved
include/edm4hep/GenToolInfo.h Outdated Show resolved Hide resolved
Comment on lines 25 to 27
const auto names = frame.getParameter<std::vector<std::string>>(generatorToolName).value();
const auto versions = frame.getParameter<std::vector<std::string>>(generatorToolVersion).value();
const auto descriptions = frame.getParameter<std::vector<std::string>>(generatorToolDescription).value();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case any of these parameters is not available this implementation will throw a std::bad_optional_access, which might be a bit cryptic, but would at least be in line with "fail early in case of problems".

If we wanted a slightly more relaxed approach without exceptions escaping, we could do:

Suggested change
const auto names = frame.getParameter<std::vector<std::string>>(generatorToolName).value();
const auto versions = frame.getParameter<std::vector<std::string>>(generatorToolVersion).value();
const auto descriptions = frame.getParameter<std::vector<std::string>>(generatorToolDescription).value();
const auto names = frame.getParameter<std::vector<std::string>>(generatorToolName).value_or({});
const auto versions = frame.getParameter<std::vector<std::string>>(generatorToolVersion).value_or({});
const auto descriptions = frame.getParameter<std::vector<std::string>>(generatorToolDescription).value_or({});

In that case the worst that can happen is an empty vector, which probably also shows that something went wrong.

I am in favor of the current implementation as that makes errors harder to ignore and easier to diagnose (even if the std::bad_optional_access might be a bit cryptic).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After some more discussion with @dirkzerwas about how this will be used, I think we can do the "error" handling internally, and simply return an empty vector instead of letting the exception propagate. The main reasons are

  • vector::empty still let's users check whether information was available
  • Information on the tools rather falls into a "nice to have" category, rather than crucially important information for event processing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value_or should do then the right thing then

include/edm4hep/Constants.h Outdated Show resolved Hide resolved
test/read_events.h Outdated Show resolved Hide resolved
hegner and others added 2 commits June 12, 2024 12:45
Co-authored-by: Thomas Madlener <[email protected]>
Copy link
Contributor

@tmadlener tmadlener left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few more things triggered by the clashing variable and class names.

Comment on lines 57 to 58
static constexpr const char* GeneratorEventParameters = "GeneratorEventParameters";
static constexpr const char* GeneratorPdfInfo = "GeneratorPdfInfo";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are now clashing with the generated classes. Maybe all of these should just be prefixed with Gen instead of Generator?

Copy link
Contributor Author

@hegner hegner Jun 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I will do that indeed. The label helped as well before ;-)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or we add a namespace for the constants.
And we copy the existing constants into the namespace and deprecate the namespaceless ones (we can even align spelling when we do that without breaking anything)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that would be a major change and a bit cumbersome if dealing e.g. with the matrices end the corresponding enums. Do you really want to do that?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC the enums were supposed to stay where they are, since they aren't constants

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like a edm4hep::label namespace for things that are a label. the enums for covariance matrix access can / should stay in the edm4hep namespace.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #315 for a proposal with a new labels namespace.


namespace edm4hep {

struct GenToolInfo {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The classes we define in the yaml file are all prefixed Generator, but this one is Gen. For consistency reasons, I think they should both be the same. The choice is also related with how we want to name the constants.

@tmadlener
Copy link
Contributor

Resolved the merge conflicts, added a bit of documentation and renamed GenToolInfo -> GeneratorToolInfo for consistency.

@hegner
Copy link
Contributor Author

hegner commented Jun 17, 2024

Perfect. Thank you

@tmadlener
Copy link
Contributor

Unless there are complaints by tomorrows meeting, I will merge this shortly before that.

@dirkzerwas
Copy link

@tmadlener
include/edm4hep/GenToolInfo.h
you might need to swiitch to plural for:
frame.putParameter(GeneratorToolNames, std::move(names));
frame.putParameter(GeneratorToolVersions, std::move(versions));
frame.putParameter(GeneratorToolDescriptions, std::move(descriptions));
and for the triplet of frame.getParameter(....)

include/edm4hep/GeneratorToolInfo.h Outdated Show resolved Hide resolved
include/edm4hep/GeneratorToolInfo.h Outdated Show resolved Hide resolved
@tmadlener tmadlener merged commit bd9f450 into key4hep:main Jun 18, 2024
8 of 11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants