Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 LDNe create huge temporary files #1

Open
bunop opened this issue May 2, 2023 · 0 comments
Open

🐛 LDNe create huge temporary files #1

bunop opened this issue May 2, 2023 · 0 comments
Labels
bug Something isn't working performance Improve the performance or better resource management

Comments

@bunop
Copy link
Member

bunop commented May 2, 2023

Running multiple instances of LDNe in a HPC environment quickly exhaust temporary partition space. This is difficult to manage since there are deleted files in which LDNe continues to write data into. This step can be reproduced by running a single instance of LDNe using a quite big input file. After a few minutes, we can inspect the content of the linux /proc directory:

$ cd /proc/104286/fd
$ ll
total 0
lr-x------ 1 paolo paolo 64 mag  2 11:14 0 -> /dev/null
l-wx------ 1 paolo paolo 64 mag  2 11:14 1 -> /dev/shm/nxf.BfoLPCWMLq/.command. Out
l-wx------ 1 paolo paolo 64 mag  2 11:14 2 -> /dev/shm/nxf.BfoLPCWMLq/.command. Err
lr-x------ 1 paolo paolo 64 mag  2 11:14 3 -> /home/paolo/Projects/NEXTFLOWetude/nf-rldne/work/86/67a17d3c3ed82d7c4d27707559a79a/.command.sh
lr-x------ 1 paolo paolo 64 mag  2 11:14 4 -> /home/paolo/Projects/NEXTFLOWetude/nf-rldne/work/86/67a17d3c3ed82d7c4d27707559a79a/murciano_gmmphk_50_individuals_1_step_genepop_Ne_params.txt
lr-x------ 1 paolo paolo 64 mag  2 11:14 5 -> /home/paolo/Projects/NEXTFLOWetude/nf-rldne/work/05/48715bf152998b241a103f440fe91f/murciano_gmmphk_50_individuals_1_step_genepop.txt
l-wx------ 1 paolo paolo 64 mag  2 11:14 6 -> /home/paolo/Projects/NEXTFLOWetude/nf-rldne/work/86/67a17d3c3ed82d7c4d27707559a79a/murciano_gmmphk_50_individuals_1_step_genepop_Ne_out.txt
l-wx------ 1 paolo paolo 64 mag  2 11:14 7 -> /home/paolo/Projects/NEXTFLOWetude/nf-rldne/work/86/67a17d3c3ed82d7c4d27707559a79a/murciano_gmmphk_50_individuals_1_step_genepop_Ne_outxLD.txt
lrwx------ 1 paolo paolo 64 mag  2 11:14 8 -> '/tmp/#5508702 (deleted)'
lrwx------ 1 paolo paolo 64 mag  2 11:14 9 -> '/tmp/#5508703 (deleted)'

The two temporary files are associated to file descriptor 8 and 9 in this example and since they are deleted, we can't estimate how much spaces they require. The only way is trying to write a fancy command line found in stackoverflow after a few minutes the process is running:

$ lsof \
| grep REG \
| grep -v "stat: No such file or directory" \
| grep -v DEL \
| awk '{if ($NF=="(deleted)") {x=3;y=1} else {x=2;y=0}; {print $(NF-x) "  " $(NF-y) } }'  \
| sort -n -u  \
| numfmt  --field=1 --to=iec \
| tail -n2
497M  /home/paolo/.local/share/kite/kite-v2.20210610.0/kited
1,3G  /tmp/#5508702

Where fd 8 has written more than 1Gb of data for a 3' and 30'' running process

@bunop bunop added bug Something isn't working performance Improve the performance or better resource management labels May 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working performance Improve the performance or better resource management
Projects
None yet
Development

No branches or pull requests

1 participant