Jobs vs. Flows: Packing cheap (e.g. post-processing) tasks at the end of a @job
#1777
Closed
Andrew-S-Rosen
started this conversation in
General
Replies: 1 comment
-
I ultimately agreed with the idea that the |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
This question was raised by @zulissimeta, and I would like to open it up for further discussion (e.g. perhaps @Nekkrad or @samblau have opinions).
There are a few
@job
s inquacc
that pack multiple tasks together in a single@job
if one or more of them are incredibly short. A good example of this is the VASPdouble_relax_job
, which runs a relaxation and then one follow-up relaxation for good measure. This follow-up relaxation is often extremely inexpensive; perhaps just a step or two. Another nice example is the Quantum ESPRESSObands_job
proposed in #1701 that does a non-self consistent calculation and then two cheap post-processing steps thereafter all in a single@job
.The question I'd like to raise here is: going forward, should we pack cheap tasks in a single
@job
, or should we always defer to a@flow
pattern?The main reason for adopting a single
@job
in such scenarios is that, for users where each@job
is a Slurm job, it reduces the number of jobs in the queue, thereby increasing overall throughput. This can make a big difference for users of Covalent or Jobflow (via FireWorks'qlaunch
method) where this job scheduling paradigm is common. However, it doesn't make a big difference for users of Dask, Parsl, or Prefect which all adopt the pilot job model where multiple@job
s are packed into a single Slurm job that continually pulls in new work. It also doesn't impact people that don't use a workflow engine or users of Redun, where this concept of jobs vs. flows is irrelevant.On the flipside, the benefit of having each discrete unit of work be a
@job
is that it can make things a bit more intuitive. For instance, pretty much all@job
s inquacc
return adict
of some schema; however, in thedouble_relax_job
, a{"relax1": RunSchema, "relax2": RunSchema}
dictionary is returned, which is more akin to what a@flow
typically returns. If for some reason the "short" step fails, it is also easier to just rerun that one component if they are separate@job
s.I can see both sides of the argument and would be interested in getting people's thoughts on this.
Beta Was this translation helpful? Give feedback.
All reactions