Skip to content

WMCore debugging tools

Alan Malta Rodrigues edited this page Mar 18, 2020 · 11 revisions

This wiki is meant to list debugging use cases, either to solve/debug Operations issues or internal Dev ones.

Debug whether all jobs have been recovered via ACDCs

Problem: Ops request us to check why the workflow hasn't processed 100% of the lumi sections, even though all the failures have been recovered via ACDCs Solution: first we need to make sure that ACDCs have been created AND executed for every single task path (fileset_name, in terms of ACDC collection). Details: what we need to retrieve/check, is:

  • did the ACDCs get created after the initial/original workflow moved to completed status?
  • list the amount of jobs/lumis in each fileset_name, from the ACDC collection
  • query reqmgr2 for ACDC workflows recovering that workflow (and fetch their InitialTaskPath)
  • make sure that those ACDC workflows are in completed status
  • anything else

Find out which run/lumi is missing in the output dataset

Problem: Ops request us to investigate why the output datasets are missing statistics, even though there are no job failures reported (or they have all been recovered). Solution: not necessarily a solution. However, part of the solution above has to be applied here, thus check whether all lumis have been recovered. In addition to that, we could have a tool that takes a workflow as input, it finds all the run/lumis meant to be processed, randomly selects one output dataset and compare it against the input dataset. Finally, yielding a list of run/lumis missing in the output dataset.

Clone this wiki locally