IDEA: A parallel cluster API for clustermq #119

HenrikBengtsson · 2019-01-21T17:52:59Z

Just like parallel::makePSOCKCluster() sets up a SOCKcluster object of SOCKnode workers, I think it would not be too complicated(*) to provide CMQcluster and CMQnode alternatives for clustermq.
For instance,

cl <- clustermq::makeClusterMCQ("sge")
y <- parallel::parLapply(1:10, FUN = sqrt)
parallel::stopCluster(cl)

(*) Roughly, S3 methods for generic functions such as sendCall() and recvResult() and possibly a few more is what needs to be implemented for the QMQnode class.

This would bring the clustermq backend to users/code of the parallel cluster API and thereby lower the mental threshold that some might have for using/migrating to it. It would also make clustermq immediately available to the future framework, e.g.

library(future)
cl <- clustermq::makeClusterMCQ("sge")
plan(cluster, workers = cl)
...

PS. I realized this while working on the future.clustermq backend - I got a first very-rough prototype of the latter up and running (too early to share or be used).

The text was updated successfully, but these errors were encountered:

mschubert · 2019-01-22T09:12:54Z

Good points!

I was already looking into this for #109 (with an S3 for the object workers(...) returns) but it hasn't materialized yet.

mschubert · 2024-01-03T12:30:16Z

Looking at this again after a long time, I think providing a ParallelCluster backend for clustermq workers is not straightforward.

The reason for this is that, as @HenrikBengtsson said, the S3 methods sendCall and recvResult would need to be provided, so that e.g. parLapply could take the cluster object and use those methods internally. parLapply and its contained parallel:::staticClusterApply split the jobs per worker and then apply sendCall/recvResult as many times as there are workers:

# parallel:::staticClusterApply
function (cl = NULL, fun, n, argfun)
{
    cl <- defaultCluster(cl)
    p <- length(cl)
    if (n > 0L && p) {
        val <- vector("list", n)
        start <- 1L
        while (start <= n) {
            end <- min(n, start + p - 1L)
            jobs <- end - start + 1L
            for (i in 1:jobs) sendCall(cl[[i]], fun, argfun(start + i - 1L))
            val[start:end] <- lapply(cl[1:jobs], recvResult)
            start <- start + jobs
        }
        checkForRemoteErrors(val)
    }
}

However, clustermq and its worker API do not address individual workers, but rather receive the next ready result regardless of which worker it originates from. The options here would be:

Provide a way to address individual workers in clustermq (this goes against design principles)
Always pretend to have one parallel worker, but split work to the actual number of workers using Q

It seems that (2) could be a viable option.

HenrikBengtsson · 2024-01-04T03:19:39Z

However, clustermq and its worker API do not address individual workers, but rather receive the next ready result regardless of which worker it originates from.
...
Always pretend to have one parallel worker, but split work to the actual number of workers using Q

If so, one need to worry about the different "load-balancing" features of the "cluster" API, e.g. parLapply(..., chunk.size) and parLapplyLB().

It might very well be that the parallel "cluster" API is not designed to work with such setups. Maybe it was never anticipated in the original design.

mschubert · 2024-01-04T20:40:19Z

The *LB methods should be fine, but I don't see a way to support clusterCall or clusterEvalQ. The docs suggest to use those to load libraries on the nodes, which will never work.

So we could not support any package that uses those functions. Even if this is not widespread (which it may be) it would cause unexpected failures.

mschubert · 2024-03-26T09:13:29Z

Relevant: https://bugs.r-project.org/show_bug.cgi?id=18587

mschubert added the idea label Feb 14, 2019

mschubert added this to the v0.9.0 milestone Apr 7, 2019

mschubert added the next-release label Apr 19, 2019

mschubert added the external request label Jul 10, 2019

mschubert added the good first issue label Jul 18, 2019

mschubert removed the idea label Mar 29, 2021

mschubert mentioned this issue Mar 26, 2023

support for tidymodels? #232

Closed

mschubert mentioned this issue May 6, 2023

Trying to get drake transient workers going via {future.clustermq} #237

Closed

mschubert removed the community request label Aug 30, 2023

mschubert added enhancement and removed next release labels Dec 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

IDEA: A parallel cluster API for clustermq #119

IDEA: A parallel cluster API for clustermq #119

HenrikBengtsson commented Jan 21, 2019 •

edited

Loading

mschubert commented Jan 22, 2019

mschubert commented Jan 3, 2024

HenrikBengtsson commented Jan 4, 2024

mschubert commented Jan 4, 2024

mschubert commented Mar 26, 2024

IDEA: A parallel cluster API for clustermq #119

IDEA: A parallel cluster API for clustermq #119

Comments

HenrikBengtsson commented Jan 21, 2019 • edited Loading

mschubert commented Jan 22, 2019

mschubert commented Jan 3, 2024

HenrikBengtsson commented Jan 4, 2024

mschubert commented Jan 4, 2024

mschubert commented Mar 26, 2024

HenrikBengtsson commented Jan 21, 2019 •

edited

Loading