Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IDEA: A parallel cluster API for clustermq #119

Open
HenrikBengtsson opened this issue Jan 21, 2019 · 5 comments
Open

IDEA: A parallel cluster API for clustermq #119

HenrikBengtsson opened this issue Jan 21, 2019 · 5 comments
Milestone

Comments

@HenrikBengtsson
Copy link

HenrikBengtsson commented Jan 21, 2019

Just like parallel::makePSOCKCluster() sets up a SOCKcluster object of SOCKnode workers, I think it would not be too complicated(*) to provide CMQcluster and CMQnode alternatives for clustermq.
For instance,

cl <- clustermq::makeClusterMCQ("sge")
y <- parallel::parLapply(1:10, FUN = sqrt)
parallel::stopCluster(cl)

(*) Roughly, S3 methods for generic functions such as sendCall() and recvResult() and possibly a few more is what needs to be implemented for the QMQnode class.

This would bring the clustermq backend to users/code of the parallel cluster API and thereby lower the mental threshold that some might have for using/migrating to it. It would also make clustermq immediately available to the future framework, e.g.

library(future)
cl <- clustermq::makeClusterMCQ("sge")
plan(cluster, workers = cl)
...

PS. I realized this while working on the future.clustermq backend - I got a first very-rough prototype of the latter up and running (too early to share or be used).

@mschubert
Copy link
Owner

Good points!

I was already looking into this for #109 (with an S3 for the object workers(...) returns) but it hasn't materialized yet.

@mschubert
Copy link
Owner

Looking at this again after a long time, I think providing a ParallelCluster backend for clustermq workers is not straightforward.

The reason for this is that, as @HenrikBengtsson said, the S3 methods sendCall and recvResult would need to be provided, so that e.g. parLapply could take the cluster object and use those methods internally. parLapply and its contained parallel:::staticClusterApply split the jobs per worker and then apply sendCall/recvResult as many times as there are workers:

# parallel:::staticClusterApply
function (cl = NULL, fun, n, argfun)
{
    cl <- defaultCluster(cl)
    p <- length(cl)
    if (n > 0L && p) {
        val <- vector("list", n)
        start <- 1L
        while (start <= n) {
            end <- min(n, start + p - 1L)
            jobs <- end - start + 1L
            for (i in 1:jobs) sendCall(cl[[i]], fun, argfun(start + i - 1L))
            val[start:end] <- lapply(cl[1:jobs], recvResult)
            start <- start + jobs
        }
        checkForRemoteErrors(val)
    }
}

However, clustermq and its worker API do not address individual workers, but rather receive the next ready result regardless of which worker it originates from. The options here would be:

  1. Provide a way to address individual workers in clustermq (this goes against design principles)
  2. Always pretend to have one parallel worker, but split work to the actual number of workers using Q

It seems that (2) could be a viable option.

@HenrikBengtsson
Copy link
Author

However, clustermq and its worker API do not address individual workers, but rather receive the next ready result regardless of which worker it originates from.
...
Always pretend to have one parallel worker, but split work to the actual number of workers using Q

If so, one need to worry about the different "load-balancing" features of the "cluster" API, e.g. parLapply(..., chunk.size) and parLapplyLB().

It might very well be that the parallel "cluster" API is not designed to work with such setups. Maybe it was never anticipated in the original design.

@mschubert
Copy link
Owner

The *LB methods should be fine, but I don't see a way to support clusterCall or clusterEvalQ. The docs suggest to use those to load libraries on the nodes, which will never work.

So we could not support any package that uses those functions. Even if this is not widespread (which it may be) it would cause unexpected failures.

@mschubert
Copy link
Owner

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants