Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FDS running too slow on linux clusters #13186

Open
shivam11021 opened this issue Jul 17, 2024 · 21 comments
Open

FDS running too slow on linux clusters #13186

shivam11021 opened this issue Jul 17, 2024 · 21 comments
Assignees

Comments

@shivam11021
Copy link

Describe the bug
I recently installed FDS (the latest version) on my university's linux clusters. We have about 4 servers having 132 cpus each. I tried running a simulation on one server using 32 cpus (ntasks). The simulation is taking too long. In fact, it is even slower than my personal laptop. I am new to using FDS and would appreciate if someone could help. The job control script that I am using is attached.

Expected behavior
Was expecting FDS to run at a much faster pace

Screenshots

Here's how the output file looks
image

script2.txt

@rmcdermo
Copy link
Contributor

We need to also see your input file. The output you provide shows you are only using 1 MPI process. Do you only have 1 mesh?

@shivam11021
Copy link
Author

Small_Scale_FINAL (1).txt

I think there are 3 meshes

@rmcdermo
Copy link
Contributor

rmcdermo commented Jul 17, 2024

FDS can only map an MPI process to a single mesh (this is how we do domain decomposition). So, your --ntasks and --ntasks-per-node need to be 3.

But something else is not right because you are seeing "Number of MPI Processes: 1". I don't quite understand this.

But fix the number to 3 and try again and let us know.

@marcosvanella
Copy link
Contributor

Try this:
!/bin/bash
#SBATCH -J Small_Scale_FINAL.fds
#SBATCH -e /home/krishna1/a/sharm368/FDS/SAMPLE_FILES/f1.err
#SBATCH -o /home/krishna1/a/sharm368/FDS/SAMPLE_FILES/f1.log
#SBATCH --partition=batch # Replace 'your_queue_name' with the actual partition/queue name
#SBATCH --nodes=1
#SBATCH --ntasks=3
#SBATCH --cpus-per-task=1
export OMP_NUM_THREADS=1
cd /home/krishna1/a/sharm368/FDS/SAMPLE_FILES # Replace with the actual path if different
srun -N 1 -n 3 /home/krishna1/a/sharm368/FDS/FDS6/bin/fds Small_Scale_FINAL.fds

Tasks here refers to MPI processes, one per mesh. Note your meshes have significantly different sizes. This is bad for load balance. Your calculation will be as slow as the slowest worker, here the process managing mesh 02.

@johodges
Copy link
Collaborator

You may not be able to use the pre-compiled FDS binaries with your cluster. srun may be linked to a specific mpiexec/mpirun executable that is built with a different compiler than the pre-compiled binaries. When you try to run FDS with the wrong mpiexec sometimes it will spin up N number of series process copies of the same input.

In addition to the comments by Randy and Marcos, I would try building the source code yourself with the compiler environment available on your cluster.

@shivam11021
Copy link
Author

FDS can only map an MPI process to a single mesh (this is how we do domain decomposition). So, your --ntasks and --ntasks-per-node need to be 3.

But something else is not right because you are seeing "Number of MPI Processes: 1". I don't quite understand this.

But fix the number to 3 and try again and let us know.

I changed the number to 3 and reran the simulation. Seems like the speed hasn't improved a lot. Here's the new file. Still shows Number of MPI processes = 1

f1_err.txt

@shivam11021
Copy link
Author

Try this: !/bin/bash #SBATCH -J Small_Scale_FINAL.fds #SBATCH -e /home/krishna1/a/sharm368/FDS/SAMPLE_FILES/f1.err #SBATCH -o /home/krishna1/a/sharm368/FDS/SAMPLE_FILES/f1.log #SBATCH --partition=batch # Replace 'your_queue_name' with the actual partition/queue name #SBATCH --nodes=1 #SBATCH --ntasks=3 #SBATCH --cpus-per-task=1 export OMP_NUM_THREADS=1 cd /home/krishna1/a/sharm368/FDS/SAMPLE_FILES # Replace with the actual path if different srun -N 1 -n 3 /home/krishna1/a/sharm368/FDS/FDS6/bin/fds Small_Scale_FINAL.fds

Tasks here refers to MPI processes, one per mesh. Note your meshes have significantly different sizes. This is bad for load balance. Your calculation will be as slow as the slowest worker, here the process managing mesh 02.

I changed the batch file and the speed hasn't impoved by a lot. Would using 3 nodes or tasks that are multiple of 3 help. I wanted to take advantage of the large number of cpus we have in a server. Is there a way I could do that here?

Thanks

@johodges
Copy link
Collaborator

The mpiexec or mpirun you are using is from a different compiler than the binary FDS executable. That's why you see three repeats of each of the initialization info. See my previous message on compiling FDS using your cluster's build environment.

@shivam11021
Copy link
Author

You may not be able to use the pre-compiled FDS binaries with your cluster. srun may be linked to a specific mpiexec/mpirun executable that is built with a different compiler than the pre-compiled binaries. When you try to run FDS with the wrong mpiexec sometimes it will spin up N number of series process copies of the same input.

In addition to the comments by Randy and Marcos, I would try building the source code yourself with the compiler environment available on your cluster.

What would building the source code entail? Not very familiar with it

@rmcdermo
Copy link
Contributor

@johodges Shouldn't things work correctly if the user adds the environment variables to their .bashrc as suggested at the end of the FDS-SMV install process?

@shivam11021 Have you installed FDS yourself, or did you have your sys admin do it?

@shivam11021
Copy link
Author

@johodges Shouldn't things work correctly if the user adds the environment variables to their .bashrc as suggested at the end of the FDS-SMV install process?

@shivam11021 Have you installed FDS yourself, or did you have your sys admin do it?

I did it myself but took the help of the IT people

@rmcdermo
Copy link
Contributor

Are you loading modules or are you running the FDS6VARS.sh script?

@shivam11021
Copy link
Author

I just go to the terminal and use the command "bash script.sh". I don't load any modules explicitly

@johodges
Copy link
Collaborator

I have run into this issue before when I tried to use a compiled version of FDS on frontera which also utilizes srun. If you look in the submission file the user is not calling mpiexec/mpirun directly. That means srun is deciding which mpi version to call.

@johodges
Copy link
Collaborator

Sorry for the double post. I just checked again on Frontera. If I source FDS6VARS.sh in my bashrc file I am able to run the compiled version. However, they are using the intel mpiexec under the hood. I think I was misremembering which cluster I ran into this issue. I think it was probably polaris since they use the mpich compiler.

@shivam11021 can you run an interactive job on your cluster then check which mpi is being pulled in? You can type "which mpiexec" and it will tell you which file it is. You can also run "mpiexec --version" to see which compiler was used on it.

@shivam11021
Copy link
Author

Sorry for the double post. I just checked again on Frontera. If I source FDS6VARS.sh in my bashrc file I am able to run the compiled version. However, they are using the intel mpiexec under the hood. I think I was misremembering which cluster I ran into this issue. I think it was probably polaris since they use the mpich compiler.

@shivam11021 can you run an interactive job on your cluster then check which mpi is being pulled in? You can type "which mpiexec" and it will tell you which file it is. You can also run "mpiexec --version" to see which compiler was used on it.

This is what I got -- ~/FDS/FDS6/bin/INTEL/bin/mpiexec

Intel(R) MPI Library for Linux* OS, Version 2021.6 Build 20220227 (id: 28877f3f32)

@johodges
Copy link
Collaborator

Did you try adding a source command for FDS6VARS? Try submitting this one:

#!/bin/bash
#SBATCH -J Small_Scale_FINAL.fds
#SBATCH -e /home/krishna1/a/sharm368/FDS/SAMPLE_FILES/f1.err
#SBATCH -o /home/krishna1/a/sharm368/FDS/SAMPLE_FILES/f1.log
#SBATCH --partition=batch # Replace 'your_queue_name' with the actual partition/queue name
#SBATCH --nodes=1
#SBATCH --ntasks=32
#SBATCH --cpus-per-task=1
#SBATCH --ntasks-per-node=32
source /home/krishna1/a/sharm368/FDS/FDS6/bin/FDS6VARS.sh
export OMP_NUM_THREADS=1
cd /home/krishna1/a/sharm368/FDS/SAMPLE_FILES # Replace with the actual path if different
srun -N 1 -n 32 --ntasks-per-node=32 /home/krishna1/a/sharm368/FDS/FDS6/bin/fds Small_Scale_FINAL.fds

If that does not work, try submitting this job:

#!/bin/bash
#SBATCH -J Small_Scale_FINAL.fds
#SBATCH -e /home/krishna1/a/sharm368/FDS/SAMPLE_FILES/f1.err
#SBATCH -o /home/krishna1/a/sharm368/FDS/SAMPLE_FILES/f1.log
#SBATCH --partition=batch # Replace 'your_queue_name' with the actual partition/queue name
#SBATCH --nodes=1
#SBATCH --ntasks=32
#SBATCH --cpus-per-task=1
#SBATCH --ntasks-per-node=32
source /home/krishna1/a/sharm368/FDS/FDS6/bin/FDS6VARS.sh
export OMP_NUM_THREADS=1
cd /home/krishna1/a/sharm368/FDS/SAMPLE_FILES # Replace with the actual path if different
srun -N -n 4 which mpiexec > ~/mpiver.txt

Then post the mpiver.txt file that is generated.

@shivam11021
Copy link
Author

@johodges I tried running the first script but it didn't lead to any enhancement in the analysis speed.

The second one didn't run FDS, nor did it create the mpiver.txt file.

@johodges
Copy link
Collaborator

Sorry for the double post. I just checked again on Frontera. If I source FDS6VARS.sh in my bashrc file I am able to run the compiled version. However, they are using the intel mpiexec under the hood. I think I was misremembering which cluster I ran into this issue. I think it was probably polaris since they use the mpich compiler.
@shivam11021 can you run an interactive job on your cluster then check which mpi is being pulled in? You can type "which mpiexec" and it will tell you which file it is. You can also run "mpiexec --version" to see which compiler was used on it.

This is what I got -- ~/FDS/FDS6/bin/INTEL/bin/mpiexec

Intel(R) MPI Library for Linux* OS, Version 2021.6 Build 20220227 (id: 28877f3f32)

When you did this check, were you on one of the compute nodes when you ran the which command? If so, start another interactive job and then run your case manually on the compute node using the same source commands. Let us know if you still see the multiple repeats of each part of the initialization.

@Yunlongjasonliu
Copy link

I am testing FDS parallel computing with Linux machine, I find the running speed is not faster than on my laptop machine, any idea?

@drjfloyd
Copy link
Contributor

FDS speed is function of the hardware (number and types of CPU, amount and type of memory, bus, etc.) and the configuration of the machine (how many user's, cluster or standalone, what other software is running, etc.). It is certainly not that case that every Linux machine is faster than any Windows machine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants