[MNT] Add joblib backend option and set default to all parallelized estimators #1797

baraline · 2024-07-14T15:32:43Z

Describe the issue

Some of the estimators that use joblib for parallelization use process-based backend, while other use threads-based backend. Ideally, we want this to be a parameter tunable by the users.

Suggest a potential alternative/fix

Including a joblib_backend parameter which would default to threading (from discussions with Matthew), and use this parameter to set the joblib backend parameter during Parallel calls would fix the issue.

The text was updated successfully, but these errors were encountered:

TonyBagnall · 2024-07-14T15:37:07Z

I remember we looked at this in some detail when debugging some rocket features, was there some interaction with its use of prange? I cant remember!

def _static_transform_uni(X, parameters, indices):
    """Transform a 2D collection of univariate time series.

    Implemented separately to the multivariate version for numba efficiency reasons.
    See issue #1778.
    """
    n_cases, n_timepoints = X.shape
    (
        _,
        _,
        dilations,
        n_features_per_dilation,
        biases,
    ) = parameters
    n_kernels = len(indices)
    n_dilations = len(dilations)
    f = n_kernels * np.sum(n_features_per_dilation)
    features = np.zeros((n_cases, f), dtype=np.float32)
    for i in prange(n_cases):

CodeLionX · 2024-07-16T07:23:22Z

I think adding this parameter to so many APIs would clutter the parameter lists. What are your thoughts on using joblib.parallel_config?
This means that all aeon code does not specify any parallel backend in their calls to joblib.Parallel (if it does not require a certain backend) and the user can choose its backend by wrapping calls within:

with joblib.parallel_config(backend="loky"):
   # aeon code

The same is actually possible for n_jobs etc.

baraline · 2024-07-16T07:59:13Z

True, forgot about this option ! That or we set a global variable such as AEON_JOBLIB_BACKEND and document its usage

CodeLionX · 2024-07-16T08:51:24Z

A custom env var AEON_JOBLIB_BACKEND IMO just makes sense if it is also used for other parallel stuff besides joblib. Otherwise, we can use the existing (documented and potentially already known) joblib-facilities.

MatthewMiddlehurst · 2024-07-17T09:55:03Z

Sounds like a good idea, would have to be documented though. Not sure about removing n_jobs (not that it was really suggested), but would be good to tidy up the other bits.

baraline · 2024-07-17T11:37:42Z

I'm OK with the use of the parallel_config option and let user warp estimators if they want to change from the default. But if we want to default to threads, we need to do the following, which look like parallel_config doesn't overwrite :

with parallel_config(backend='loky'):
    p = Parallel(backend='threading')
    print(p._backend)

You obtain <joblib._parallel_backends.ThreadingBackend object at 0x000001D91BB731F0>

This means that if we want the parallel_config option to work, we would need to use default Parallel(), which use the loky backend. This is exactly the one we want to move away from for multiple reasons (check https://scikit-learn.fondation-inria.fr/joblib-sprint/ for some).

Or did I miss something ?

CodeLionX · 2024-07-22T07:46:38Z

ah, damn ... yes, Parallel() uses the loky-backend by default. Then, I don't see how parallel_config would help us here.

baraline · 2024-07-22T08:24:18Z

So for options we have :

Add a parameter to specify backend, which defaults to threads for all estimators with parallel capabilities
Add an environment variable to specify backend package wide.

With an equal level of documentation, I think the first options would allow us to be more flexible if for whatever reason we need to use processes for some estimators and threads for others. Also it should be less prone to causing issues with existing methods.

But true, this adds a bit of parameter bloat to the API... We could introduce a dictionary regrouping all joblib params and use it as kwargs when calling parallel ?

Other ideas are welcome !

CodeLionX · 2024-07-24T09:29:22Z

I'm also in favor of 1. How many joblib parameters do we have? If there are just 2, I would not use kwargs for that.

baraline added good first issue Good for newcomers API design API design & software architecture maintenance Continuous integration, unit testing & package distribution labels Jul 14, 2024

baraline mentioned this issue Aug 1, 2024

[ENH] n_jobs/_n_jobs, parameter in classifiers #1886

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MNT] Add joblib backend option and set default to all parallelized estimators #1797

[MNT] Add joblib backend option and set default to all parallelized estimators #1797

baraline commented Jul 14, 2024 •

edited

Loading

TonyBagnall commented Jul 14, 2024

CodeLionX commented Jul 16, 2024

baraline commented Jul 16, 2024

CodeLionX commented Jul 16, 2024

MatthewMiddlehurst commented Jul 17, 2024 •

edited

Loading

baraline commented Jul 17, 2024

CodeLionX commented Jul 22, 2024

baraline commented Jul 22, 2024

CodeLionX commented Jul 24, 2024

[MNT] Add joblib backend option and set default to all parallelized estimators #1797

[MNT] Add joblib backend option and set default to all parallelized estimators #1797

Comments

baraline commented Jul 14, 2024 • edited Loading

Describe the issue

Suggest a potential alternative/fix

TonyBagnall commented Jul 14, 2024

CodeLionX commented Jul 16, 2024

baraline commented Jul 16, 2024

CodeLionX commented Jul 16, 2024

MatthewMiddlehurst commented Jul 17, 2024 • edited Loading

baraline commented Jul 17, 2024

CodeLionX commented Jul 22, 2024

baraline commented Jul 22, 2024

CodeLionX commented Jul 24, 2024

baraline commented Jul 14, 2024 •

edited

Loading

MatthewMiddlehurst commented Jul 17, 2024 •

edited

Loading