Proof of concept: running metadata and data computations on Dask #2316

bouweandela · 2024-01-31T15:25:03Z

Description

This pull request splits the computation up in three stages:

Preprocessor functions are run in parallel using Dask without saving data
Preprocessor files are populated with data in parallel using Dask
Diagnostic scripts are run

Only works with max_parallel_tasks: 1 at the moment.

Ideas for further improvements:

optimize multi-model functions, as these limit parallelism
use one delayed per group in multi-model/ensemble means to increase parallelism
try to make delayed operations 'pure', e.g. by copying the input cubes in preprocess before calling the preprocessor function
see if splitting Dataset.load prior to preprocessor step concatenate up in multiple delayeds improves parallelism

Blocking issues

These are things that block this from being used in practice.

ESMPy crashes if you try to from a different thread than the main one. Example script that produces the crash:

import threading

import numpy as np


def run():
    import esmpy
    m = esmpy.Manager(debug=True)
    esmpy.Grid(np.array((10, 20)),
               num_peri_dims=1,
               staggerloc=[esmpy.StaggerLoc.CENTER])


def main():

    thread = threading.Thread(target=run)
    thread.start()
    thread.join()


if __name__ == '__main__':
    main()

results in Segmentation fault (core dumped) and a log file called PET0.ESMF_LogFile is written by ESMF with the following content:

20240226 150217.785 INFO             PET0 !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
20240226 150217.785 INFO             PET0 !!! THE ESMF_LOG IS SET TO OUTPUT ALL LOG MESSAGES !!!
20240226 150217.785 INFO             PET0 !!!     THIS MAY CAUSE SLOWDOWN IN PERFORMANCE     !!!
20240226 150217.785 INFO             PET0 !!! FOR PRODUCTION RUNS, USE:                      !!!
20240226 150217.785 INFO             PET0 !!!                   ESMF_LOGKIND_Multi_On_Error  !!!
20240226 150217.785 INFO             PET0 !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
20240226 150217.785 INFO             PET0 Running with ESMF Version   : 8.4.2
20240226 150217.785 INFO             PET0 ESMF library build date/time: "Apr 26 2023" "11:27:56"
20240226 150217.785 INFO             PET0 ESMF library build location : /home/conda/feedstock_root/build_artifacts/esmf_1682507633250/work
20240226 150217.785 INFO             PET0 ESMF_COMM                   : mpiuni
20240226 150217.785 INFO             PET0 ESMF_MOAB                   : enabled
20240226 150217.785 INFO             PET0 ESMF_LAPACK                 : enabled
20240226 150217.785 INFO             PET0 ESMF_NETCDF                 : enabled
20240226 150217.785 INFO             PET0 ESMF_PNETCDF                : disabled
20240226 150217.785 INFO             PET0 ESMF_PIO                    : disabled
20240226 150217.785 INFO             PET0 ESMF_YAMLCPP                : enabled
20240226 150217.785 ERROR            PET0 ESMCI_VM.C:2169 ESMCI::VM::getCurrent() Internal error: Bad condition  - - Could not determine current VM

Issue reported via ESMF support mailinglist

Concerns

These are things that we need to be careful about, but should not a problem.

thread safety, known unsafe libraries:
- NetCDF4 library
custom configuration (config-developer, extra facets, custom cmor tables) may not be available on Dask workers
is provenance correctly updated with results from preprocessing before saving?
potential for re-using parts of the computation seems limited da.store loses dependency information dask/dask#8380

Before you get started

☝ Create an issue to discuss what you are going to do

Checklist

It is the responsibility of the author to make sure the pull request is ready to review. The icons indicate whether the item will be subject to the 🛠 Technical or 🧪 Scientific review.

🧪 The new functionality is relevant and scientifically sound
🛠 This pull request has a descriptive title and labels
🛠 Code is written according to the code quality guidelines
🧪 and 🛠 Documentation is available
🛠 Unit tests have been added
🛠 Changes are backward compatible
🛠 Any changed dependencies have been added or removed correctly
🛠 The list of authors is up to date
🛠 All checks below this pull request were successful

To help with the number pull requests:

🙏 We kindly ask you to review two other open pull requests in this repository

valeriupredoi · 2024-02-05T17:02:33Z

I dig this PR 😍 we should talk about the bigger picture though - may be able to suggest some novel stuffs 🍺

…-metadata-and-save

First draft of delayed computations

2141d0b

bouweandela requested a review from fnattino January 31, 2024 15:27

bouweandela added 5 commits January 31, 2024 21:06

Nicer log messages

37f3cf6

More parallel implementation

b92238b

Avoid extra write to preprocessor output files

709a3ca

Fix provenance for multimodel functions

8be4f32

Log times for provenance init and meta task graph

99f4dfa

bouweandela changed the title ~~Proof of concept of running metadata and data computations on Dask~~ Proof of concept: running metadata and data computations on Dask Feb 5, 2024

bouweandela mentioned this pull request Feb 12, 2024

Use dask to run tasks #1714

Closed

15 tasks

bouweandela mentioned this pull request Feb 23, 2024

Use iris' regridder caching for faster regridding? #2341

Closed

Merge branch 'main' of github.com:ESMValGroup/ESMValCore into delayed…

59f8899

…-metadata-and-save

bouweandela mentioned this pull request Jul 23, 2024

Add an iris-esmf-regrid based regridding scheme #2457

Merged

9 tasks

This was referenced Sep 9, 2024

Load esmvalcore.dataset.Dataset objects in parallel using Dask #2517

Open

Merge input cubes only once when computing lazy multimodel statistics #2518

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proof of concept: running metadata and data computations on Dask #2316

Proof of concept: running metadata and data computations on Dask #2316

bouweandela commented Jan 31, 2024 •

edited

Loading

valeriupredoi commented Feb 5, 2024

Proof of concept: running metadata and data computations on Dask #2316

Are you sure you want to change the base?

Proof of concept: running metadata and data computations on Dask #2316

Conversation

bouweandela commented Jan 31, 2024 • edited Loading

Description

Blocking issues

Concerns

Before you get started

Checklist

valeriupredoi commented Feb 5, 2024

bouweandela commented Jan 31, 2024 •

edited

Loading