-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problem: apparent memory leak in MCPClient #1482
Comments
Our tests have shown that this approach doesn't solve the issue :( |
After merging artefactual/archivematica#1845 we started noticing some intermittent failures in the acceptance tests runs: GitHub
JenkinsThis is how a From a run of the
And a run of the
It seems the |
The *** RUNNING TASK: verifyandrestructuretransferbag_v0.0***
archivematicaClient.py: INFO 2024-02-05 11:53:23,780 archivematica.mcp.client:signal_handler:22: Received termination signal (15)
Process ForkPoolWorker-13:4:
Traceback (most recent call last):
File "/pyenv/data/versions/3.9.18/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/pyenv/data/versions/3.9.18/lib/python3.9/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/pyenv/data/versions/3.9.18/lib/python3.9/multiprocessing/pool.py", line 114, in worker
task = get()
File "/pyenv/data/versions/3.9.18/lib/python3.9/multiprocessing/queues.py", line 367, in get
return _ForkingPickler.loads(res)
File "/src/src/MCPClient/lib/client/mcp.py", line 23, in signal_handler
pool.stop()
File "/src/src/MCPClient/lib/client/pool.py", line 103, in stop
self.pool_maintainance_thread.join()
File "/pyenv/data/versions/3.9.18/lib/python3.9/threading.py", line 1057, in join
raise RuntimeError("cannot join current thread")
RuntimeError: cannot join current thread The problem seems to come from the ...
bag.validate(
processes=multiprocessing.cpu_count(), completeness_only=completeness_only
)
... When the parameter is set and higher than 1 ...
try:
pool = multiprocessing.Pool(
processes if processes else None, initializer=worker_init
)
hash_results = pool.map(_calc_hashes, args)
finally:
pool.terminate()
... This pool doesn't wait for its process to finish (there's no Interestingly, if processes > 1:
pool = multiprocessing.Pool(processes=processes)
checksums = pool.map(manifest_line_generator, _walk(data_dir))
pool.close()
pool.join() |
@mamedin tested this extensively with transfers with many files in the |
Hello, I've run into a peculiar that I'm wondering if might be related to the updates here. Since we've upgraded to 1.16 and also from RHEL 7 to 9, I've been noticing this occasional problem where our S3 uploads indefinitely hang without throwing an error and stalls out the system, which requires restarting AM services. I haven't been able to pinpoint what's causing the problem but in testing out different s3 configurations, I've noticed so far that setting |
Expected behaviour
MCPClient's memory usage should be contained to some extent.
Current behaviour
MCPClient's memory usage seems to be steadily increasing, e.g. one customer reported that one MCPClient process was using 4.6g of resident memory, representing 29.1% of the total available in their system. In this particular scenario, all memory was exhausted and new attempts to allocate memory failed causing the system to fail. We haven't confirmed yet whether this growth is totally unconstrained or we're seeing the upper bound of a portion of memory that is reusable.
We have identified two areas of code where memory allocated doesn't seem to be freed immediately after a client module is executed: 1) when parsing documents with lxml (see more), where the problem is aggravated when processing more files which result in larger documents, and, 2) much less noticeable, when the
fork_runner
module performs process-based parallelism viamultiprocessing.Pool
.A few potential solutions:
collect_ids=False
(read more here),For the time being, users can increase the amount of system memory provisioned and expect MCPClient to allocate 4g or more when processing packages with many files. Memory usage can be observed in various ways. MCPClient reports various metrics via Prometheus - the most relevant is
process_resident_memory_bytes
. This branch has been made available to log resident memory after very client module run, e.g. see log-worker-with-fork-runner.txt or log-worker-without-fork-runner.txt.Steps to reproduce
Your environment (version of Archivematica, operating system, other relevant details)
Archivematica 1.11-1.13
For Artefactual use:
Before you close this issue, you must check off the following:
The text was updated successfully, but these errors were encountered: