-
Notifications
You must be signed in to change notification settings - Fork 108
WMCore debugging tools
This wiki is meant to list debugging use cases, either to solve/debug Operations issues or internal Dev ones.
Problem: Ops request us to check why the workflow hasn't processed 100% of the lumi sections, even though all the failures have been recovered via ACDCs Solution: first we need to make sure that ACDCs have been created AND executed for every single task path (fileset_name, in terms of ACDC collection). Details: what we need to retrieve/check, is:
- did the ACDCs get created after the initial/original workflow moved to
completed
status? - list the amount of jobs/lumis in each
fileset_name
, from the ACDC collection - query reqmgr2 for ACDC workflows recovering that workflow (and fetch their
InitialTaskPath
) - make sure that those ACDC workflows are in
completed
status - anything else
Problem: Ops request us to investigate why the output datasets are missing statistics, even though there are no job failures reported (or they have all been recovered). Solution: not necessarily a solution. However, part of the solution above has to be applied here, thus check whether all lumis have been recovered. In addition to that, we could have a tool that takes a workflow as input, it finds all the run/lumis meant to be processed, randomly selects one output dataset and compare it against the input dataset. Finally, yielding a list of run/lumis missing in the output dataset.