-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Review zipping tool #2215
Comments
Minor update in terms of finding the right explanation: Mitigation 1 is clearly "make it faster". |
Oh, that's relevant for sure. Yes, making this an non-blocking operation in an async approach would be very valuable to ensure we don't run into worker timeout issues for unexpected scaling (larger data provenance for some reason, a job with much larger than usual logs, more images being processed etc.) |
In the current concurrency model (see #2218), we need to be careful about making the zipping function async and non-blocking. Here are three examples of how to run a bash command:
We are now close to situation 2 (we don't run from concurrent.futures import ThreadPoolExecutor
import subprocess
import shlex
import time
import asyncio
def function(sleeptime):
subprocess.run(
shlex.split(f"sleep {sleeptime}"),
capture_output=True,
check=True,
)
async def function_async_subprocess(sleeptime):
subprocess.run(
shlex.split(f"sleep {sleeptime}"),
capture_output=True,
check=True,
)
async def function_async_create_subprocess_exec(sleeptime):
proc = await asyncio.create_subprocess_exec(*shlex.split(f"sleep {sleeptime}"))
stdout, stderr = await proc.communicate()
print()
async def run_it_all_subprocess(sleeptime):
task1 = asyncio.create_task(function_async_subprocess(sleeptime))
task2 = asyncio.create_task(function_async_subprocess(sleeptime))
await task1
await task2
async def run_it_all_create_subprocess_exec(sleeptime):
task1 = asyncio.create_task(function_async_create_subprocess_exec(sleeptime))
task2 = asyncio.create_task(function_async_create_subprocess_exec(sleeptime))
await task1
await task2
if __name__ == "__main__":
print("BLOCK 1")
t0 = time.perf_counter()
sleeptime = 1
with ThreadPoolExecutor(max_workers=10) as executor:
fut1 = executor.submit(function, sleeptime)
fut2 = executor.submit(function, sleeptime)
t1 = time.perf_counter()
print(f"{t1-t0}")
assert abs(t1 - t0 - sleeptime) < 0.1
print()
print("BLOCK 2")
t0 = time.perf_counter()
asyncio.run(run_it_all_subprocess(sleeptime))
t1 = time.perf_counter()
print(f"{t1-t0}")
print()
print("BLOCK 3")
t0 = time.perf_counter()
asyncio.run(run_it_all_create_subprocess_exec(sleeptime))
t1 = time.perf_counter()
print(f"{t1-t0}")
assert abs(t1 - t0 - sleeptime) < 0.1
print() |
Question cc @jluethi Right now we can download the zipped job folders in two ways:
Use case 1 is obviously critical.
|
I agree that use case 1 is more critical than use case 2, but use case 2 can be helpful sometimes. There are 2 things that make me hesitant on the conclusion here: a) for me argues that maybe we want to refactor to the partial zip being put in the tmp folder etc. But b) argues that the need to download full zips in intermediate states would go away, so we shouldn't invest too much into it. |
zipfile
to gzip
Status update (with @mfranzon, as always..):
|
Thanks for the summary!
So for my understanding: We wouldn't expect any worker timeouts due to zipping anymore with this server version? But we could still trigger them with the download partial zip folder (not a very common action) for the time being? |
Agreed.
That's true (and it did not change with this version). As you note, this only applies to downloading logs for an ongoing job (where the zip has to be created on-the-fly). Ref #2223 |
Closing, as the relevant parts of this issue have been superseded by |
Context:
zipfile
tarfile
to asubprocess.run
wrap of native tools (that is,tar
).tarfile
library was quite heavier than native tools (that is,tar
), and attributed at least part of this difference to the large number of additionalos.stat
calls. Note that this difference became more evident due to our environment (a CEPH filesystem with a cold cache).tarfile
totar
, see e.g. 1596 review use of tartarfile in compress folderpy module #1641.We don't have a controlled way of reproducing the issue in point 1, but all the items above suggest that we should also move the current
zipfile
-based zipping operations tosubprocess.run
wraps ofgzip
. In the worst case, we are just improving the performance of one operation. In the best case, we are fixing the issue at point 1.The text was updated successfully, but these errors were encountered: