You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've been running DCC on a dataset and have noticed writing tmp_nonduplicates.# files is taking an extremely long time. For context, here is the _tmp_DCC/ after 23 hours of running:
total 1.1G
-rw-r--r-- 1 bdigby 373M Mar 24 14:53 fust1_1.Chimeric.out.junction.PLJSNR
-rw-r--r-- 1 bdigby 15M Mar 25 14:12 tmp_duplicates.8D5DA9
-rw-r--r-- 1 bdigby 248M Mar 24 14:53 tmp_merged
-rw-r--r-- 1 bdigby 78M Mar 25 14:47 tmp_nonduplicates.8D5DA9
-rw-r--r-- 1 bdigby 164M Mar 24 14:53 tmp_printcirclines.8D5DA9
-rw-r--r-- 1 bdigby 248M Mar 24 14:53 tmp_twochimera
The resources requested for this job are as follows:
Can you offer any insights on what might be limiting this step? i.e do you think perhaps increasing/reducing available resources might expedite the process?
It would also be useful to get an idea of the final size of the tmp_nonduplicates.# - will it be a similar size to tmp_printcircles.#? This can help me gauge an appropriate TimeLimit through trial and error.
Another layer to this is two of the six samples have stopped running but bizarrely did not produce an exit code error. See below for the line in the nextflow log:
Mar-25 12:47:22.254 [Task monitor] DEBUG nextflow.executor.GridTaskHandler - Failed to get exit status for process TaskHandler[jobId: 6058404; id: 97; name: DCC (N2_1); status: RUNNING; exit: -; error: -; workDir: /data/bdigby/Projects/large_test_data/work/fd/6a0841a7f3d2471b4483b52d998f6e started: 1648130944677; exited: -; ] -- exitStatusReadTimeoutMillis: 270000; delta: 270018
I contacted the system administrator but he was not able to see any evidence of resources being exceeded (nextflow would have also reported this).
Any insights as to why this step might fail would be extremely useful.
N.B The analysis is on WBcel253, having used DCC multiple times on human datasets, I am surprised by this behaviour with a relatively small reference genome.
Thanks in advance,
Barry
The text was updated successfully, but these errors were encountered:
Hi Tobias,
I've been running DCC on a dataset and have noticed writing
tmp_nonduplicates.#
files is taking an extremely long time. For context, here is the_tmp_DCC/
after 23 hours of running:The resources requested for this job are as follows:
Can you offer any insights on what might be limiting this step? i.e do you think perhaps increasing/reducing available resources might expedite the process?
It would also be useful to get an idea of the final size of the
tmp_nonduplicates.#
- will it be a similar size totmp_printcircles.#
? This can help me gauge an appropriateTimeLimit
through trial and error.Another layer to this is two of the six samples have stopped running but bizarrely did not produce an exit code error. See below for the line in the nextflow log:
Mar-25 12:47:22.254 [Task monitor] DEBUG nextflow.executor.GridTaskHandler - Failed to get exit status for process TaskHandler[jobId: 6058404; id: 97; name: DCC (N2_1); status: RUNNING; exit: -; error: -; workDir: /data/bdigby/Projects/large_test_data/work/fd/6a0841a7f3d2471b4483b52d998f6e started: 1648130944677; exited: -; ] -- exitStatusReadTimeoutMillis: 270000; delta: 270018
I contacted the system administrator but he was not able to see any evidence of resources being exceeded (nextflow would have also reported this).
Any insights as to why this step might fail would be extremely useful.
N.B
The analysis is onWBcel253
, having usedDCC
multiple times on human datasets, I am surprised by this behaviour with a relatively small reference genome.Thanks in advance,
Barry
The text was updated successfully, but these errors were encountered: