Rework parallel doc example using CalcJob's #288

agoscinski · 2024-09-02T07:44:21Z

The previous example did rely on calcfunctions that are always run sequentially. This example now uses CalcJobs to actually achieve parallel executions.

@superstar54 I removed the old example since it did not make sense to me much, since it was using calcfunctions, but I might be missing something you wanted to actually show with it.

docs/gallery/howto/autogen/parallel.py

codecov-commenter · 2024-09-02T07:59:43Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 80.64%. Comparing base (5937b88) to head (d28bb99).
Report is 79 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #288      +/-   ##
==========================================
+ Coverage   75.75%   80.64%   +4.89%     
==========================================
  Files          70       66       -4     
  Lines        4615     5147     +532     
==========================================
+ Hits         3496     4151     +655     
+ Misses       1119      996     -123

Flag	Coverage Δ
python-3.11	`80.55% <ø> (+4.88%)`	⬆️
python-3.12	`80.55% <ø> (?)`
python-3.9	`80.60% <ø> (+4.86%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

docs/gallery/howto/autogen/parallel.py

superstar54 · 2024-09-03T06:55:12Z

Hi @agoscinski , thanks for your efforts on this.

Yes, using CalcJob is indeed the correct approach for running jobs in parallel.

The old example, which utilizes a for loop to create multiple tasks, reflects many real-world scenarios where parallel execution is essential. For instance:

Calculating the equation of state, where multiple SCF tasks are created and need to run in parallel.
Generating multiple surface slabs from a crystal structure.
Simulating molecule adsorption on the non-equivalent sites of a surface slab.
Performing XPS calculations on non-equivalent sites.

Besides, in most cases, after these tasks run in parallel, we need to gather and process the results.

The old example is quite valuable for realistic applications, as already mentioned in this post. Therefore, I’d like to suggest keeping the old example but updating it to replace the calcfunction with CalcJob. Additionally, the provenance graph is very informative and beneficial for users to understand the workflow, so I recommend retaining it as well.

agoscinski · 2024-09-07T07:28:16Z

I wanted to put the for loop back into the notebook but compare it to executing WorkGraphs separately so I can relate it a bit to what is written here https://aiida.discourse.group/t/run-only-one-job-on-local-machine/459/2
but I am not sure in this case how to actually get the time when the process has finished. I first used mtime for it as I thought the node is modified after the execution has finished, but the resulting mtime-ctime is below the sleeping time. Is there another way to take the time of a process using submit? I dont want to time it in the notebook as I need to submit two workgraphs.

from aiida import load_profile
from aiida_workgraph import WorkGraph
from aiida.calculations.arithmetic.add import ArithmeticAddCalculation
from aiida.orm import load_code, load_node, load_computer, InstalledCode

load_profile()

from aiida.common.exceptions import NotExistent

# The ArithmeticAddCalculation needs to know where bash is stored
try:
    code = load_code("add@localhost")  # The computer label can also be omitted here
except NotExistent:
    code = InstalledCode(
        computer=load_computer("localhost"),
        filepath_executable="/bin/bash",
        label="add",
        default_calc_job_plugin="core.arithmetic.add",
    ).store()



@task.graph_builder()
def parallel_add(nb_it):
    wg = WorkGraph()
    code = load_code("add@localhost")
    for i in range(nb_it):
        add = wg.add_task(ArithmeticAddCalculation, name=f"add{i}", x=5, y=i, code=code)
        add.set({"metadata.options.sleep": 5})
    return wg
    
wg = WorkGraph(f"parallel_graph_builder")
t = wg.add_task(parallel_add, nb_it=2)

wg.submit(wait=True)
print(load_node(wg.pk).mtime - load_node(wg.pk).ctime)

superstar54 · 2024-09-07T13:51:50Z

but the resulting mtime-ctime is below the sleeping time.

This shouldn't happen. Could you double-check?

agoscinski · 2024-09-09T07:35:13Z

Ah it is because of caching. After I disabled the caching with verdi the caching is still somehow active, not sure why. I will try to reproduce it later and do an issue on aiida-core

docs/source/conf.py

agoscinski · 2024-09-10T09:28:00Z

I had to rename the parallel file so no cached version is used from read the docs. I am still confused why this happened. Since I enforced a rerun on the sphinx-gallery side using run_stale_examples, it might be due to some caching of RTD. When the PR has been reviewed, I will change the name of the name after the review.

superstar54

Hi @agoscinski thanks for the work. I do have one concern: the current execution time of 2 minutes and 4.786 seconds is quite long.
Here are my suggestions:

The "Parallelizing WorkGraphs" section is not needed. Take one use case as an example, I have PwRelax workgraph and a large set of structures to relax. Users just need to directly submit multiple workgraphs using a simple loop, without waiting for each to finish. If users write another workgraph to manage parallel execution, users need to think about how to passing input into the sub-workgraph, and handling potential interruptions of the top-level workgraph. While tracking the provenance of 100 workgraphs submitted together might be a potential benefit, it’s not something most users would require.
Add a description of the workgraph execution mechanism at the beginning of the notebook: Workgraphs are based on dependency-driven execution, meaning parallel tasks run automatically if the tasks have no dependency on each others, and there are sufficient resources. I recall you demonstrated this well in the EuroScipy tutorial, with a workgraph showing how the process works in parallel (the execution order). You could include that example here, and also show the GUI for better visualization.
Better to show the GUI for every workgraph: This will help users easily see which tasks will be run in parallel.
Mention the result-gathering process: Typically, users will want to gather the results after parallel tasks are complete. You can add a link to the aggregate notebook, and if it's not ready yet, you can include a comment that it will be added later.

docs/gallery/howto/autogen/parallel_wf.py

superstar54 · 2024-09-10T09:45:36Z

docs/gallery/howto/autogen/parallel_wf.py

+@task.graph_builder(
+    inputs=[{"name": "integer"}], outputs=[{"name": "sum", "from": "sum_task.result"}]
+)
+def add10_wg(integer):


Why use this name add10_wg?

add10 because it adds 10 to a number. wg is I think a bit general indicating the problem that I have when I want to refer a graph builder. It can be a function as defined here, a task when integrated into a WorkGraph, and it is actually a WorkGraph that is executed. Because the whole tutorial is about parallelizing workgraphs I added wg into the name, but we can also remove it.

renamed to add10 I think that is more consistent with the rest

docs/gallery/howto/autogen/parallel_wf.py

agoscinski · 2024-09-11T11:53:52Z

docs/gallery/howto/autogen/parallel_wf.py

+@task.graph_builder(
+    inputs=[{"name": "integer"}], outputs=[{"name": "sum", "from": "sum_task.result"}]
+)
+def add10_wg(integer):


add10 because it adds 10 to a number. wg is I think a bit general indicating the problem that I have when I want to refer a graph builder. It can be a function as defined here, a task when integrated into a WorkGraph, and it is actually a WorkGraph that is executed. Because the whole tutorial is about parallelizing workgraphs I added wg into the name, but we can also remove it.

docs/gallery/howto/autogen/parallel_wf.py

agoscinski · 2024-09-11T12:27:53Z

The "Parallelizing WorkGraphs" section is not needed. Take one use case as an example, I have PwRelax workgraph and a large set of structures to relax. Users just need to directly submit multiple workgraphs using a simple loop, without waiting for each to finish. If users write another workgraph to manage parallel execution, users need to think about how to passing input into the sub-workgraph, and handling potential interruptions of the top-level workgraph. While tracking the provenance of 100 workgraphs submitted together might be a potential benefit, it’s not something most users would require.

I would not rely on any qe dependent documentation pages as I think we should move them to aiida-tutorials. Also this is more focused on the parallel execution, measuring timings and talking about daemons. If there is a user base for which both use cases might be useful, then we should keep it (at least until more feedback from outside). If it is the running time, then we should work an improving this. I will reduce the sleeps and the iterations at the end to reduce the running time to 1 minute.

Add a description of the workgraph execution mechanism at the beginning of the notebook: Workgraphs are based on dependency-driven execution, meaning parallel tasks run automatically if the tasks have no dependency on each others, and there are sufficient resources. I recall you demonstrated this well in the EuroScipy tutorial, with a workgraph showing how the process works in parallel (the execution order). You could include that example here, and also show the GUI for better visualization.

This is mentioned in the second sentence of the example. I will include the GUI and write comments in the first example emphasizing on this. Maybe you can note in the code where I should also mention it.

Better to show the GUI for every workgraph: This will help users easily see which tasks will be run in parallel.

Included!

Mention the result-gathering process: Typically, users will want to gather the results after parallel tasks are complete. You can add a link to the aggregate notebook, and if it's not ready yet, you can include a comment that it will be added later.

Okay reference it at the end, but the sphinx reference does not work for the moment, since we need to wait for #287 to be merged

docs/gallery/howto/autogen/parallel_wf.py

agoscinski · 2024-09-11T18:55:54Z

I decreased the time now to 1min11s by reusing the graph builder runs from before for the daemon part. Also I just keep 2 iterations as this is enough for the showcase of the effect of daemons. I reduced the sleep time to 3 seconds.

superstar54 · 2024-09-18T20:49:33Z

Hi @agoscinski , the docs failed to build. The pre-commit failed. Could you please fix them? I will review them as soon as they pass.

agoscinski · 2024-09-19T08:56:43Z

docs/gallery/howto/autogen/parallel_wf.py

@@ -0,0 +1,278 @@
+"""


Note that RTD caches the other file somehow, therefore I renamed it. I don't fully understand this.

The previous example did rely on calcfunctions that are always run sequentially. This example now uses CalcJobs to actually achieve parallel executions.

Co-authored-by: Xing Wang <[email protected]>

…re run in parallel

docs/gallery/howto/autogen/parallel.py

superstar54 · 2024-09-20T11:36:23Z

docs/gallery/howto/autogen/parallel.py

+# Be aware that for the moment AiiDA can only run 200 WorkGraphs at the same time.
+# To increase that limit one can set this variable to a higher value.


Suggested change

# Be aware that for the moment AiiDA can only run 200 WorkGraphs at the same time.

# To increase that limit one can set this variable to a higher value.

# Be aware that for the moment, AiiDA can only run 200 processes (WorkGraph, CalcJob etc) at the same time.

# To increase that limit, one can set this variable to a higher value.

superstar54 · 2024-09-20T11:39:49Z

docs/gallery/howto/autogen/parallel.py

+# Since each daemon worker can only manage one WorkGraph (handling the results)
+# at a time, one can experience slow downs when running many jobs that can be
+# run in parallel. The optimal number of workers depends highly on the jobs


Suggested change

# Since each daemon worker can only manage one WorkGraph (handling the results)

# at a time, one can experience slow downs when running many jobs that can be

# run in parallel. The optimal number of workers depends highly on the jobs

# One can experience slow downs when running many jobs (e.g., 100 jobs) that can be

# run in parallel. The optimal number of workers depends highly on the jobs

superstar54 · 2024-09-20T11:41:48Z

docs/gallery/howto/autogen/parallel.py

+# The overhead time has shortens a bit as the handling of the CalcJobs and
+# WorkGraphs could be parallelized. One can increase the number of iterations


Are you sure on he overhead time has shortens a bit?

Looking at the time, there is no improvement.

Time for running parallelized graph builder 0:00:11.262496 Time for running parallelized graph builder with 2 daemons 0:00:11.264958

superstar54 · 2024-09-20T11:44:59Z

docs/gallery/howto/autogen/parallel.py

+#        verdi daemon restart
+


It would be good to add a link to the performance page.

agoscinski commented Sep 2, 2024

View reviewed changes

docs/gallery/howto/autogen/parallel.py Outdated Show resolved Hide resolved

agoscinski marked this pull request as ready for review September 2, 2024 08:52

agoscinski requested a review from superstar54 September 2, 2024 08:52

agoscinski commented Sep 2, 2024

View reviewed changes

docs/gallery/howto/autogen/parallel.py Outdated Show resolved Hide resolved

agoscinski commented Sep 2, 2024

View reviewed changes

docs/gallery/howto/autogen/parallel.py Outdated Show resolved Hide resolved

agoscinski commented Sep 2, 2024

View reviewed changes

docs/gallery/howto/autogen/parallel.py Outdated Show resolved Hide resolved

agoscinski commented Sep 2, 2024

View reviewed changes

docs/gallery/howto/autogen/parallel.py Outdated Show resolved Hide resolved

agoscinski force-pushed the parallel-rework branch 2 times, most recently from 45eb592 to c977986 Compare September 10, 2024 08:09

agoscinski commented Sep 10, 2024

View reviewed changes

docs/source/conf.py Outdated Show resolved Hide resolved

superstar54 requested changes Sep 10, 2024

View reviewed changes

agoscinski commented Sep 11, 2024

View reviewed changes

docs/gallery/howto/autogen/parallel_wf.py Outdated Show resolved Hide resolved