diff --git a/doc/PodioInputOutput.md b/doc/PodioInputOutput.md new file mode 100644 index 00000000..4f9eb4fa --- /dev/null +++ b/doc/PodioInputOutput.md @@ -0,0 +1,128 @@ + +# Reading and writing EDM4hep files in Gaudi + +The facilities to read and write EDM4hep (or in general event data models based +on podio) are provided by [`k4FWCore`](https://github.com/key4hep/k4FWCore). +This page will describe their usage, but not go into too much details of their +internals. This page also assumes a certain familiarity with Gaudi, i.e. most of +the snippets just show a minimal configuration part, and not a complete runnable +example. + +## The `k4DataSvc` + +Whenever you want to work with EDM4hep in the Gaudi based framework of Key4hep, +you will need to use the `k4DataSvc` as *EventDataSvc*. You can instantiate and +configure this service like the following + +```python +from Gaudi.Configuration import * +from Configurables import k4DataSvc + +evtSvc = k4DataSvc("EventDataSvc") +``` + +**It is important that the name is `EventDataSvc` in this case, as otherwise +this is an assumption from Gaudi.** Once you have the `k4DataSvc` instantiated, +you still have to make the `ApplicationMgr` aware of it, by making sure that the +`evtSvc` is in the list of the *external services* (`ExtSvc`): + +```python +from Configurables import ApplicationMgr +ApplicationMgr( + # other args + ExtSvc = [evtSvc] +) +``` + +## Reading events + +To read events you will need to use the `PodioInput` algorithm in addition to +the [`k4DataSvc`](#the-k4datasvc). Currently, you will need to pass the input +file to the `k4DataSvc` via the `input` option but pass the collections that you +want to read to the `PodioInput`. We are working on making this (discussion +happens in this [issue](https://github.com/key4hep/k4FWCore/issues/105)). The +parts of your options file related to reading EDM4hep files will look something +like this + +```python +from Configurables import PodioInput, k4DataSvc + +evtSvc = k4DataSvc("EventDataSvc") +evtSvc.input = "/path/to/your/input-file.root" + +podioInput = PodioInput() +podioInput.collections = [ + # the complete list of collection names to read +] +``` + +**Note that currently only the collections that are inside the `collections` +list will be read and become available for later algorithms.** + +It is possible to change the input file from the command line via +```bash +k4run --EventDataSvc.input= +``` + +## Writing events + +To write events you will need to use the `PodioOutput` algorithm in addition to +the [`k4DataSvc`](#the-k4datasvc): + +```python +from Configurables import PodioOutput + +podioOutput = PodioOutput("PodioOutput", filename="my_output.root") +``` + +By default this will write the complete event contents to the output file. + +### Writing only a subset of collections + +Sometimes it is desirable to limit the collections to a subset of all available +collections from the EventStore. The `PodioOutput` allows to do this via the +`outputCommands` option that takes a list of `keep` or `drop` commands. Each +command must consist of the `keep`/`drop` command and a target. The target is a +collection name that may include the `?` or `*` wildcard patterns. This might +look like the following + +```python +podioOutput.outputCommands = ["keep *"] +``` + +which will keep everything (the default), while + +```python +podioOutput.outputCommands = ["drop *"] +``` + +will simply drop all collections and effectively write an empty file (apart from +some metadata). A common pattern is to `"drop *"` and then selectively adding +`keep` collections to keep, e.g. to only keep the highest level MC and reco +information: + +```python +podioOutput.outputCommands = [ + "drop *", + "keep MCParticlesSkimmed", + "keep PandoraPFOs", + "keep RecoMCTruthLink", +] +```