Replies: 2 comments 9 replies
-
@NikoYura hi, From may experience of working with JUDI on the cloud the main aspects are:
Julia should be run from shared folder as I remember. |
Beta Was this translation helpful? Give feedback.
-
Hi @NikoYura thanks for reaching out and glad you are using JUDI Which cloud provider are you interested in? For Azure, there is a little wrapper for Azure Batch that is very easy to use you can look at https://github.com/slimgroup/JUDI4Cloud.jl I find it easier to use than the AzManager and such (but it does use AzStorage under the hood if needed as well) In general, you will need to make sure your environment is consistent for all workers as julia tends to be very gimmicky when there is a package version mismatch (and will usually completely crash if there is Julia version mismatch). The simplest way to ensure this is to set up your project with a |
Beta Was this translation helpful? Give feedback.
-
I'm working on setting up a distributed computing environment for a JUDI. I have several questions about integrating JUDI with cloud services. I would gratefully appreciate insights and suggestions from the community on the following aspects:
Setting Up Distributed Computing in Cloud with JUDI:
How can I effectively connect JUDI with distributed computing capabilities of a Cloud setup (shot parallelization over many nodes)?
My overall task is to somehow connect workers from FWI JUDI examples with my nodes. I did the following:
I checked the example with implementing Azure services. As I understand, to implement these packages (AzManagers, AzSessions, AzStorage) for any other cloud, I will need to rewrite the code inside these packages to match my specific cloud.
I used
addprocs()
with SSH, IP settings to connect to the node. It works but you need the copy of your environment on all other nodes. So it is hard to ensure consistent JUDI environments across different nodes in Cloud. What strategies are recommended for maintaining consistent package versions across all nodes in a JUDI distributed computing environment? Is it possible to manage this seamlessly using tools like Dask or Kubernetes, and how does this translate to a JUDI context?Maybe some manager for addprocs is best for this purpose (shot parallelization over many nodes)? For example Slurm? How labor intensive is it to configure this? What are the key considerations for setting up and customizing Slurm job parameters in JUDI?
I also followed this tutorial but for my specific cloud service. I took Devito image from Docker Hub and configured Kubernetes. It works for the FWI Devito example with DASK. But I do not understand how to make the same configuration/code/setup for JUDI (FWI examples with parallelization based on workers).
I realize that my questions overlap in general, but I have only one task - to parallelize shots across nodes (and as I understand it, the easiest way to do it is via wokers in the FWI JUDI examples). At this stage, I'm not sure which way would be best (at least easy to implement), and would welcome your perspective on this issue. I'm aiming to build an efficient, scalable, and robust distributed computing setup for my FWI application and would greatly benefit from the community's expertise in these areas.
Beta Was this translation helpful? Give feedback.
All reactions