FDS running too slow on linux clusters #13186

shivam11021 · 2024-07-17T14:35:11Z

Describe the bug
I recently installed FDS (the latest version) on my university's linux clusters. We have about 4 servers having 132 cpus each. I tried running a simulation on one server using 32 cpus (ntasks). The simulation is taking too long. In fact, it is even slower than my personal laptop. I am new to using FDS and would appreciate if someone could help. The job control script that I am using is attached.

Expected behavior
Was expecting FDS to run at a much faster pace

Screenshots

Here's how the output file looks

script2.txt

rmcdermo · 2024-07-17T14:56:05Z

We need to also see your input file. The output you provide shows you are only using 1 MPI process. Do you only have 1 mesh?

shivam11021 · 2024-07-17T15:14:35Z

Small_Scale_FINAL (1).txt

I think there are 3 meshes

rmcdermo · 2024-07-17T15:28:07Z

FDS can only map an MPI process to a single mesh (this is how we do domain decomposition). So, your --ntasks and --ntasks-per-node need to be 3.

But something else is not right because you are seeing "Number of MPI Processes: 1". I don't quite understand this.

But fix the number to 3 and try again and let us know.

marcosvanella · 2024-07-17T15:28:43Z

Try this:
!/bin/bash
#SBATCH -J Small_Scale_FINAL.fds
#SBATCH -e /home/krishna1/a/sharm368/FDS/SAMPLE_FILES/f1.err
#SBATCH -o /home/krishna1/a/sharm368/FDS/SAMPLE_FILES/f1.log
#SBATCH --partition=batch # Replace 'your_queue_name' with the actual partition/queue name
#SBATCH --nodes=1
#SBATCH --ntasks=3
#SBATCH --cpus-per-task=1
export OMP_NUM_THREADS=1
cd /home/krishna1/a/sharm368/FDS/SAMPLE_FILES # Replace with the actual path if different
srun -N 1 -n 3 /home/krishna1/a/sharm368/FDS/FDS6/bin/fds Small_Scale_FINAL.fds

Tasks here refers to MPI processes, one per mesh. Note your meshes have significantly different sizes. This is bad for load balance. Your calculation will be as slow as the slowest worker, here the process managing mesh 02.

johodges · 2024-07-17T15:29:20Z

You may not be able to use the pre-compiled FDS binaries with your cluster. srun may be linked to a specific mpiexec/mpirun executable that is built with a different compiler than the pre-compiled binaries. When you try to run FDS with the wrong mpiexec sometimes it will spin up N number of series process copies of the same input.

In addition to the comments by Randy and Marcos, I would try building the source code yourself with the compiler environment available on your cluster.

shivam11021 · 2024-07-17T17:50:48Z

FDS can only map an MPI process to a single mesh (this is how we do domain decomposition). So, your --ntasks and --ntasks-per-node need to be 3.

But something else is not right because you are seeing "Number of MPI Processes: 1". I don't quite understand this.

But fix the number to 3 and try again and let us know.

I changed the number to 3 and reran the simulation. Seems like the speed hasn't improved a lot. Here's the new file. Still shows Number of MPI processes = 1

f1_err.txt

shivam11021 · 2024-07-17T17:52:46Z

Try this: !/bin/bash #SBATCH -J Small_Scale_FINAL.fds #SBATCH -e /home/krishna1/a/sharm368/FDS/SAMPLE_FILES/f1.err #SBATCH -o /home/krishna1/a/sharm368/FDS/SAMPLE_FILES/f1.log #SBATCH --partition=batch # Replace 'your_queue_name' with the actual partition/queue name #SBATCH --nodes=1 #SBATCH --ntasks=3 #SBATCH --cpus-per-task=1 export OMP_NUM_THREADS=1 cd /home/krishna1/a/sharm368/FDS/SAMPLE_FILES # Replace with the actual path if different srun -N 1 -n 3 /home/krishna1/a/sharm368/FDS/FDS6/bin/fds Small_Scale_FINAL.fds

Tasks here refers to MPI processes, one per mesh. Note your meshes have significantly different sizes. This is bad for load balance. Your calculation will be as slow as the slowest worker, here the process managing mesh 02.

I changed the batch file and the speed hasn't impoved by a lot. Would using 3 nodes or tasks that are multiple of 3 help. I wanted to take advantage of the large number of cpus we have in a server. Is there a way I could do that here?

Thanks

johodges · 2024-07-17T17:53:45Z

The mpiexec or mpirun you are using is from a different compiler than the binary FDS executable. That's why you see three repeats of each of the initialization info. See my previous message on compiling FDS using your cluster's build environment.

shivam11021 · 2024-07-17T17:53:48Z

You may not be able to use the pre-compiled FDS binaries with your cluster. srun may be linked to a specific mpiexec/mpirun executable that is built with a different compiler than the pre-compiled binaries. When you try to run FDS with the wrong mpiexec sometimes it will spin up N number of series process copies of the same input.

In addition to the comments by Randy and Marcos, I would try building the source code yourself with the compiler environment available on your cluster.

What would building the source code entail? Not very familiar with it

rmcdermo · 2024-07-17T18:04:50Z

@johodges Shouldn't things work correctly if the user adds the environment variables to their .bashrc as suggested at the end of the FDS-SMV install process?

@shivam11021 Have you installed FDS yourself, or did you have your sys admin do it?

shivam11021 · 2024-07-17T18:14:08Z

@johodges Shouldn't things work correctly if the user adds the environment variables to their .bashrc as suggested at the end of the FDS-SMV install process?

@shivam11021 Have you installed FDS yourself, or did you have your sys admin do it?

I did it myself but took the help of the IT people

rmcdermo · 2024-07-17T18:26:31Z

Are you loading modules or are you running the FDS6VARS.sh script?

shivam11021 · 2024-07-17T18:28:38Z

I just go to the terminal and use the command "bash script.sh". I don't load any modules explicitly

johodges · 2024-07-17T18:29:21Z

I have run into this issue before when I tried to use a compiled version of FDS on frontera which also utilizes srun. If you look in the submission file the user is not calling mpiexec/mpirun directly. That means srun is deciding which mpi version to call.

johodges · 2024-07-17T19:05:13Z

Sorry for the double post. I just checked again on Frontera. If I source FDS6VARS.sh in my bashrc file I am able to run the compiled version. However, they are using the intel mpiexec under the hood. I think I was misremembering which cluster I ran into this issue. I think it was probably polaris since they use the mpich compiler.

@shivam11021 can you run an interactive job on your cluster then check which mpi is being pulled in? You can type "which mpiexec" and it will tell you which file it is. You can also run "mpiexec --version" to see which compiler was used on it.

shivam11021 · 2024-07-22T13:57:19Z

Sorry for the double post. I just checked again on Frontera. If I source FDS6VARS.sh in my bashrc file I am able to run the compiled version. However, they are using the intel mpiexec under the hood. I think I was misremembering which cluster I ran into this issue. I think it was probably polaris since they use the mpich compiler.

@shivam11021 can you run an interactive job on your cluster then check which mpi is being pulled in? You can type "which mpiexec" and it will tell you which file it is. You can also run "mpiexec --version" to see which compiler was used on it.

This is what I got -- ~/FDS/FDS6/bin/INTEL/bin/mpiexec

Intel(R) MPI Library for Linux* OS, Version 2021.6 Build 20220227 (id: 28877f3f32)

johodges · 2024-07-29T21:44:21Z

Did you try adding a source command for FDS6VARS? Try submitting this one:

#!/bin/bash
#SBATCH -J Small_Scale_FINAL.fds
#SBATCH -e /home/krishna1/a/sharm368/FDS/SAMPLE_FILES/f1.err
#SBATCH -o /home/krishna1/a/sharm368/FDS/SAMPLE_FILES/f1.log
#SBATCH --partition=batch # Replace 'your_queue_name' with the actual partition/queue name
#SBATCH --nodes=1
#SBATCH --ntasks=32
#SBATCH --cpus-per-task=1
#SBATCH --ntasks-per-node=32
source /home/krishna1/a/sharm368/FDS/FDS6/bin/FDS6VARS.sh
export OMP_NUM_THREADS=1
cd /home/krishna1/a/sharm368/FDS/SAMPLE_FILES # Replace with the actual path if different
srun -N 1 -n 32 --ntasks-per-node=32 /home/krishna1/a/sharm368/FDS/FDS6/bin/fds Small_Scale_FINAL.fds

If that does not work, try submitting this job:

#!/bin/bash
#SBATCH -J Small_Scale_FINAL.fds
#SBATCH -e /home/krishna1/a/sharm368/FDS/SAMPLE_FILES/f1.err
#SBATCH -o /home/krishna1/a/sharm368/FDS/SAMPLE_FILES/f1.log
#SBATCH --partition=batch # Replace 'your_queue_name' with the actual partition/queue name
#SBATCH --nodes=1
#SBATCH --ntasks=32
#SBATCH --cpus-per-task=1
#SBATCH --ntasks-per-node=32
source /home/krishna1/a/sharm368/FDS/FDS6/bin/FDS6VARS.sh
export OMP_NUM_THREADS=1
cd /home/krishna1/a/sharm368/FDS/SAMPLE_FILES # Replace with the actual path if different
srun -N -n 4 which mpiexec > ~/mpiver.txt

Then post the mpiver.txt file that is generated.

shivam11021 · 2024-07-30T20:01:47Z

@johodges I tried running the first script but it didn't lead to any enhancement in the analysis speed.

The second one didn't run FDS, nor did it create the mpiver.txt file.

johodges · 2024-08-10T16:26:02Z

Sorry for the double post. I just checked again on Frontera. If I source FDS6VARS.sh in my bashrc file I am able to run the compiled version. However, they are using the intel mpiexec under the hood. I think I was misremembering which cluster I ran into this issue. I think it was probably polaris since they use the mpich compiler.
@shivam11021 can you run an interactive job on your cluster then check which mpi is being pulled in? You can type "which mpiexec" and it will tell you which file it is. You can also run "mpiexec --version" to see which compiler was used on it.

This is what I got -- ~/FDS/FDS6/bin/INTEL/bin/mpiexec

Intel(R) MPI Library for Linux* OS, Version 2021.6 Build 20220227 (id: 28877f3f32)

When you did this check, were you on one of the compute nodes when you ran the which command? If so, start another interactive job and then run your case manually on the compute node using the same source commands. Let us know if you still see the multiple repeats of each part of the initialization.

Yunlongjasonliu · 2024-10-24T02:35:58Z

I am testing FDS parallel computing with Linux machine, I find the running speed is not faster than on my laptop machine, any idea?

drjfloyd · 2024-10-24T10:20:00Z

FDS speed is function of the hardware (number and types of CPU, amount and type of memory, bus, etc.) and the configuration of the machine (how many user's, cluster or standalone, what other software is running, etc.). It is certainly not that case that every Linux machine is faster than any Windows machine.

mcgratta assigned johodges Jul 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FDS running too slow on linux clusters #13186

FDS running too slow on linux clusters #13186

shivam11021 commented Jul 17, 2024

rmcdermo commented Jul 17, 2024

shivam11021 commented Jul 17, 2024

rmcdermo commented Jul 17, 2024 •

edited

Loading

marcosvanella commented Jul 17, 2024

johodges commented Jul 17, 2024

shivam11021 commented Jul 17, 2024

shivam11021 commented Jul 17, 2024

johodges commented Jul 17, 2024

shivam11021 commented Jul 17, 2024

rmcdermo commented Jul 17, 2024

shivam11021 commented Jul 17, 2024

rmcdermo commented Jul 17, 2024

shivam11021 commented Jul 17, 2024

johodges commented Jul 17, 2024

johodges commented Jul 17, 2024

shivam11021 commented Jul 22, 2024

johodges commented Jul 29, 2024

shivam11021 commented Jul 30, 2024

johodges commented Aug 10, 2024

Yunlongjasonliu commented Oct 24, 2024

drjfloyd commented Oct 24, 2024

FDS running too slow on linux clusters #13186

FDS running too slow on linux clusters #13186

Comments

shivam11021 commented Jul 17, 2024

rmcdermo commented Jul 17, 2024

shivam11021 commented Jul 17, 2024

rmcdermo commented Jul 17, 2024 • edited Loading

marcosvanella commented Jul 17, 2024

johodges commented Jul 17, 2024

shivam11021 commented Jul 17, 2024

shivam11021 commented Jul 17, 2024

johodges commented Jul 17, 2024

shivam11021 commented Jul 17, 2024

rmcdermo commented Jul 17, 2024

shivam11021 commented Jul 17, 2024

rmcdermo commented Jul 17, 2024

shivam11021 commented Jul 17, 2024

johodges commented Jul 17, 2024

johodges commented Jul 17, 2024

shivam11021 commented Jul 22, 2024

johodges commented Jul 29, 2024

shivam11021 commented Jul 30, 2024

johodges commented Aug 10, 2024

Yunlongjasonliu commented Oct 24, 2024

drjfloyd commented Oct 24, 2024

rmcdermo commented Jul 17, 2024 •

edited

Loading