Cannot handle huge command.txt files #125

gauvinalexandre · 2016-04-17T01:58:33Z

Attached, an example file from a worker that had hard time...

Further details to come tomorrow...
@mgermain

mgermain · 2016-04-18T20:48:52Z

Congrats! You broke Linux :D
After exploration, I saw that the problem comes from Linux freaking out because the process has been waiting for the lock too long. I'll see if there is anything I can do.

MarcCote · 2016-11-18T16:57:39Z

I think a have a better understanding of the issue now. When the file commands.txt with all the pending commands is huge, e.g. 2.5Gb worth of text, getting the next command to run is highly inefficient.

We currently need to read and rewrite the ~2.5Gb file each time we get the next command to run. This is because we consider the next command to run as the first one in commands.txt. So, we read the first line, read the rest of the file and then write back all lines except the first one, ouch!

We should take the last line of commands.txt as the next command to run and simply truncate the file.

mgermain · 2016-11-18T23:46:15Z

@MarcCote What you say is true, but this is 2 distinct issues.
@gauvinalexandre How big was your command.txt?

@MarcCote I have a vague memory that we had a proper reason to do so in the past. There is also the fact that it was NEVER intended to be used in this way. Finally, I think that having the unfinished_commands.txt and/or having a database might mitigate all this.

MarcCote · 2016-11-19T00:37:35Z

Maybe those are two distinct issues. Please @gauvinalexandre let us know :).

@mgermain
How is having a lot of commands to "dispatch" is not the intended purpose of smart-dispatch?
You are right about the database solving the issue I mentioned.

mgermain · 2016-11-19T00:46:39Z

@MarcCote Well in the case of the 70 brains, in a way, SD is used as a substitute to MPI to implement parallelism inside their own program. Maybe another way to say what I said in my previous post is that SD was never designed to launch an actual 50 Million jobs at once :P If we can find a way to do it nicely I'm not against it though.

gauvinalexandre · 2016-11-25T00:25:40Z

Hey guys! Yes, I agree it's 2 different issues, but they are related in some way.

My issue was that my tasks had a very short process-time, while the command.txt (big, but not gygabytes) was relatively long to write. So, the workers were continuously asking for more tasks. Then Linux was thinking it was stuck in a multiprocess concurrent lock because of its OS default waiting timeout parameter, so it killed workers to prevent it. In other words, workers we're dying bored (thanks for @mgermain for figuring this out).

In Max's case, the file is very long to write. So I guess the same will happen at the end, killing workers with boredom waiting for millions of tasks to be rewritten millions of times.

So in one side, there's overhead in reading too frequently the command.txt file, while in the other there's overhead writing too much. I think it's related because the process-time of tasks over the number of tasks should be a reasonable ratio. Otherwise we'll have problems.

The final message here is, we cannot just throw anything there, yet, until geniuses like @mgermain and @MarcCote find a solution. There's a smart-dispatch tweaking part. Thanks again guys!

MarcCote · 2016-11-25T18:00:28Z

From what I understand speeding up the "picking a new command to execute" would solve both issues. I suggest "taking the last line of commands.txt (a.k.a. pending) as the next command to run and simply truncate the file".

MarcCote changed the title ~~command.txt locking failure flags may lead to overhead issues~~ Cannot handle huge command.txt files Nov 18, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cannot handle huge command.txt files #125

Cannot handle huge command.txt files #125

gauvinalexandre commented Apr 17, 2016 •

edited

Loading

mgermain commented Apr 18, 2016

MarcCote commented Nov 18, 2016

mgermain commented Nov 18, 2016

MarcCote commented Nov 19, 2016

mgermain commented Nov 19, 2016

gauvinalexandre commented Nov 25, 2016

MarcCote commented Nov 25, 2016

Cannot handle huge command.txt files #125

Cannot handle huge command.txt files #125

Comments

gauvinalexandre commented Apr 17, 2016 • edited Loading

mgermain commented Apr 18, 2016

MarcCote commented Nov 18, 2016

mgermain commented Nov 18, 2016

MarcCote commented Nov 19, 2016

mgermain commented Nov 19, 2016

gauvinalexandre commented Nov 25, 2016

MarcCote commented Nov 25, 2016

gauvinalexandre commented Apr 17, 2016 •

edited

Loading