Beyond traditional HPC: containers and cloud computing #244

wlandau · 2018-08-24T15:39:08Z

wlandau
Aug 24, 2018

Can clustermq use workers on AWS, Digital Ocean, arbitrary remote Docker containers, etc.? It seems straightforward, for example, to use the ssh scheduler to deploy to workers on the same AWS instance. But what about a single pool of workers spread over multiple instances?

I was at an R conference last week, and there seems to be uncertainty and debate about the long-term future of traditional HPC systems. cc @dpastoor

mschubert · 2018-09-08T16:53:46Z

mschubert
Sep 8, 2018
Maintainer

Yes, that's definitely on the list.

In principle, you should already be able to use everything that you can connect to via SSH and has multicore set up. However, I have never tested anything like that.

For multiple remote machines, this will require some changes in how clustermq works. These will likely happen, but not in the near future.

0 replies

chapmandu2 · 2019-01-30T16:52:22Z

chapmandu2
Jan 30, 2019

Have you looked at Docker and Kubernetes to do parallel processing in the cloud? A kubernetes cluster is a lot easier to set up on AWS or Azure than a conventional cluster would be, plus you get scaling thrown in. RStudio Server Pro has just added this feature interestingly enough. I'm looking at makeClusterFunctions in batchtools and makeClusterPSOCK in future but I think Kubernetes might be better. Thanks for the great packages.

0 replies

pat-s · 2019-01-31T15:12:45Z

pat-s
Jan 31, 2019

While we currently building up a HPC, we have several standalone machines. It would be great if we could use the SSH connector to distribute jobs across all machines.

This would perfectly work together with drake and the job argument to make() which could be used to distribute the parallel jobs across as many SSH machines as possible.

0 replies

mschubert · 2019-02-01T12:07:29Z

mschubert
Feb 1, 2019
Maintainer

Thank you for the hints re kubernetes, @chapmandu2.

@pat-s Is there a reason why you don't set up a scheduler on your HPC? That would not only support clustermq as it is, but also many other tools interfacing with them (that you may want down the line).

0 replies

pat-s · 2019-02-01T14:39:21Z

pat-s
Feb 1, 2019

As said, we're already building a HPC with warewulf and slurm. Until then, we have several standalone servers that are used for production and cannot be turned off until there is a production ready replacement 🙂 our main goal is to combine all of them but until then, the multiple ssh approach would be a nice thing to have.

0 replies

mschubert · 2019-02-01T14:42:13Z

mschubert
Feb 1, 2019
Maintainer

I dropped words (the "while") while reading again, you did say. Sorry.

I'm afraid I won't have multiple SSH hosts set up in the next couple of weeks.

0 replies

wlandau · 2020-09-06T03:26:14Z

wlandau
Sep 6, 2020
Author

What about AWS Batch? Metaflow uses it.

0 replies

wlandau · 2020-09-06T04:39:48Z

wlandau
Sep 6, 2020
Author

Looks like the paws::batch() creates an object with a submit_job() method, though I am not sure how to return the job's data.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Beyond traditional HPC: containers and cloud computing #244

{{title}}

Replies: 8 comments

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Beyond traditional HPC: containers and cloud computing #244

wlandau Aug 24, 2018

Replies: 8 comments

mschubert Sep 8, 2018 Maintainer

chapmandu2 Jan 30, 2019

pat-s Jan 31, 2019

mschubert Feb 1, 2019 Maintainer

pat-s Feb 1, 2019

mschubert Feb 1, 2019 Maintainer

wlandau Sep 6, 2020 Author

wlandau Sep 6, 2020 Author

wlandau
Aug 24, 2018

mschubert
Sep 8, 2018
Maintainer

chapmandu2
Jan 30, 2019

pat-s
Jan 31, 2019

mschubert
Feb 1, 2019
Maintainer

pat-s
Feb 1, 2019

mschubert
Feb 1, 2019
Maintainer

wlandau
Sep 6, 2020
Author

wlandau
Sep 6, 2020
Author