Jobs vs. Flows: Packing cheap (e.g. post-processing) tasks at the end of a `@job` #1777

Andrew-S-Rosen · 2024-02-28T05:59:52Z

Andrew-S-Rosen
Feb 28, 2024
Maintainer

This question was raised by @zulissimeta, and I would like to open it up for further discussion (e.g. perhaps @Nekkrad or @samblau have opinions).

There are a few @jobs in quacc that pack multiple tasks together in a single @job if one or more of them are incredibly short. A good example of this is the VASP double_relax_job, which runs a relaxation and then one follow-up relaxation for good measure. This follow-up relaxation is often extremely inexpensive; perhaps just a step or two. Another nice example is the Quantum ESPRESSO bands_job proposed in #1701 that does a non-self consistent calculation and then two cheap post-processing steps thereafter all in a single @job.

The question I'd like to raise here is: going forward, should we pack cheap tasks in a single @job, or should we always defer to a @flow pattern?

The main reason for adopting a single @job in such scenarios is that, for users where each @job is a Slurm job, it reduces the number of jobs in the queue, thereby increasing overall throughput. This can make a big difference for users of Covalent or Jobflow (via FireWorks' qlaunch method) where this job scheduling paradigm is common. However, it doesn't make a big difference for users of Dask, Parsl, or Prefect which all adopt the pilot job model where multiple @jobs are packed into a single Slurm job that continually pulls in new work. It also doesn't impact people that don't use a workflow engine or users of Redun, where this concept of jobs vs. flows is irrelevant.

On the flipside, the benefit of having each discrete unit of work be a @job is that it can make things a bit more intuitive. For instance, pretty much all @jobs in quacc return a dict of some schema; however, in the double_relax_job, a {"relax1": RunSchema, "relax2": RunSchema} dictionary is returned, which is more akin to what a @flow typically returns. If for some reason the "short" step fails, it is also easier to just rerun that one component if they are separate @jobs.

I can see both sides of the argument and would be interested in getting people's thoughts on this.

Andrew-S-Rosen · 2024-02-29T20:32:25Z

Andrew-S-Rosen
Feb 29, 2024
Maintainer Author

I ultimately agreed with the idea that the @flow pattern is more intuitive. See #1786.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Jobs vs. Flows: Packing cheap (e.g. post-processing) tasks at the end of a `@job` #1777

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment

{{title}}

Select a reply

Jobs vs. Flows: Packing cheap (e.g. post-processing) tasks at the end of a @job #1777

Andrew-S-Rosen Feb 28, 2024 Maintainer

Replies: 1 comment

Andrew-S-Rosen Feb 29, 2024 Maintainer Author

Jobs vs. Flows: Packing cheap (e.g. post-processing) tasks at the end of a `@job` #1777

Andrew-S-Rosen
Feb 28, 2024
Maintainer

Andrew-S-Rosen
Feb 29, 2024
Maintainer Author