Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem: apparent memory leak in MCPClient #1482

Closed
5 tasks
sevein opened this issue Jul 23, 2021 · 5 comments
Closed
5 tasks

Problem: apparent memory leak in MCPClient #1482

sevein opened this issue Jul 23, 2021 · 5 comments
Labels
Severity: medium An inconvenient situation where the software is usable but inconvenient or slow. Status: review The issue's code has been merged and is ready for testing/review. Type: bug A flaw in the code that causes the software to produce an incorrect or unexpected result.
Milestone

Comments

@sevein
Copy link
Contributor

sevein commented Jul 23, 2021

Expected behaviour
MCPClient's memory usage should be contained to some extent.

Current behaviour
MCPClient's memory usage seems to be steadily increasing, e.g. one customer reported that one MCPClient process was using 4.6g of resident memory, representing 29.1% of the total available in their system. In this particular scenario, all memory was exhausted and new attempts to allocate memory failed causing the system to fail. We haven't confirmed yet whether this growth is totally unconstrained or we're seeing the upper bound of a portion of memory that is reusable.

We have identified two areas of code where memory allocated doesn't seem to be freed immediately after a client module is executed: 1) when parsing documents with lxml (see more), where the problem is aggravated when processing more files which result in larger documents, and, 2) much less noticeable, when the fork_runner module performs process-based parallelism via multiprocessing.Pool.

A few potential solutions:

  • If the problem was in lxml parsing, it could be solved with a custom XMLParser that uses collect_ids=False (read more here),
  • Introduce a global process pool set up on startup that limits the number of tasks that a worker process can complete before it will exit and be replaced by a fresh worker process (#1499 is an unfinished attempt that should be considered), and
  • Force MCPClient to exit after N jobs are processed, assuming that the process manager is configured to bring the service up again. While this solution is much simpler it's not how you'd expect a long-running process to behave and requires additional configuration in the process manager.

For the time being, users can increase the amount of system memory provisioned and expect MCPClient to allocate 4g or more when processing packages with many files. Memory usage can be observed in various ways. MCPClient reports various metrics via Prometheus - the most relevant is process_resident_memory_bytes. This branch has been made available to log resident memory after very client module run, e.g. see log-worker-with-fork-runner.txt or log-worker-without-fork-runner.txt.

Steps to reproduce

  1. Submit transfers with many files,
  2. Observe system metrics (see notes above)

Your environment (version of Archivematica, operating system, other relevant details)
Archivematica 1.11-1.13


For Artefactual use:

Before you close this issue, you must check off the following:

  • All pull requests related to this issue are properly linked
  • All pull requests related to this issue have been merged
  • A testing plan for this issue has been implemented and passed (testing plan information should be included in the issue body or comments)
  • Documentation regarding this issue has been written and merged (if applicable)
  • Details about this issue have been added to the release notes (if applicable)
@sevein sevein added Severity: medium An inconvenient situation where the software is usable but inconvenient or slow. Type: bug A flaw in the code that causes the software to produce an incorrect or unexpected result. triage-release-1.13 labels Jul 23, 2021
@amayita amayita added the 1.13.1 label Sep 3, 2021
@amayita amayita added this to the 1.13.1 milestone Sep 3, 2021
@amayita amayita self-assigned this Sep 3, 2021
@sromkey sromkey removed this from the 1.13.1 milestone Sep 9, 2021
@sromkey sromkey removed the 1.13.1 label Sep 20, 2021
@sevein sevein added the Status: in progress Issue that is currently being worked on. label Sep 25, 2021
@sevein sevein added this to the 1.14.0 milestone Sep 25, 2021
@sromkey
Copy link
Contributor

sromkey commented Mar 20, 2023

Our tests have shown that this approach doesn't solve the issue :(

@sromkey sromkey removed this from the 1.14.0 milestone Mar 20, 2023
@replaceafill replaceafill added triage-release-1.17 and removed Status: in progress Issue that is currently being worked on. labels Sep 1, 2023
@replaceafill replaceafill added this to the 1.16.0 milestone Oct 23, 2023
@replaceafill replaceafill added the Status: review The issue's code has been merged and is ready for testing/review. label Jan 29, 2024
@replaceafill
Copy link
Member

After merging artefactual/archivematica#1845 we started noticing some intermittent failures in the acceptance tests runs:

GitHub

  • The metadata-xml tag often gets cancelled after 6 hours being idle. Because the workflow run is cancelled there are no service logs, but it doesn't seem to be specific to any scenario on the feature file.
  • The uuids-dirs tag sometimes fails with:
    Scenario Outline: Lucien wants to create an AIP where each subdirectory of the original transfer is assigned a UUID and this is recorded in the AIP's METS file. -- @1.2 transfer sources  # features/core/uuids-for-directories.feature:65
      Given a processing configuration that assigns UUIDs to directories                                                                                                                       # features/steps/uuids_for_directories_steps.py:107
      And remote directory ~/archivematica-sampledata/SampleTransfers/BagTransfer.zip contains a hierarchy of subfolders containing digital objects                                            # features/steps/uuids_for_directories_steps.py:24
      When a zipped bag transfer is initiated on directory SampleTransfers/BagTransfer.zip                                                                                                     # features/steps/steps.py:421
      And the user waits for the AIP to appear in archival storage                                                                                                                             # features/steps/steps.py:256
    2024-02-01 23:11:51,013 - amuser.ingest - INFO - Getting SIP UUID from transfer name BagTransfer
        Traceback (most recent call last):
          File "/pyenv/data/versions/3.9.18/lib/python3.9/site-packages/behave/model.py", line 1329, in run
            match.run(runner.context)
          File "/pyenv/data/versions/3.9.18/lib/python3.9/site-packages/behave/matchers.py", line 98, in run
            self.func(context, *args, **kwargs)
          File "features/steps/steps.py", line 258, in step_impl
            _appear_in_storage(context)
          File "features/steps/steps.py", line 249, in _appear_in_storage
            uuid_val = utils.get_uuid_val(context, "sip")
          File "/home/artefactual/acceptance-tests/features/steps/utils.py", line 73, in get_uuid_val
            uuid_val = context.scenario.sip_uuid = context.am_user.browser.get_sip_uuid(
          File "/home/artefactual/acceptance-tests/amuser/am_browser_ingest_ability.py", line 60, in get_sip_uuid
            sip_uuid, _, _ = self.wait_for_transfer_to_appear(transfer_name)
          File "/home/artefactual/acceptance-tests/amuser/am_browser_transfer_ability.py", line 184, in wait_for_transfer_to_appear
            ) = self.wait_for_transfer_to_appear(
          File "/home/artefactual/acceptance-tests/amuser/am_browser_transfer_ability.py", line 184, in wait_for_transfer_to_appear
            ) = self.wait_for_transfer_to_appear(
          File "/home/artefactual/acceptance-tests/amuser/am_browser_transfer_ability.py", line 184, in wait_for_transfer_to_appear
            ) = self.wait_for_transfer_to_appear(
          [Previous line repeated 951 more times]
          File "/home/artefactual/acceptance-tests/amuser/am_browser_transfer_ability.py", line 138, in wait_for_transfer_to_appear
            self.wait_for_presence(transfer_name_div_selector)
          File "/home/artefactual/acceptance-tests/amuser/selenium_ability.py", line 129, in wait_for_presence
            self.wait_for_existence(
          File "/home/artefactual/acceptance-tests/amuser/selenium_ability.py", line 169, in wait_for_existence
            WebDriverWait(self.driver, timeout).until(element_exists)
          File "/pyenv/data/versions/3.9.18/lib/python3.9/site-packages/selenium/webdriver/support/wait.py", line 71, in until
            value = method(self._driver)
          File "/pyenv/data/versions/3.9.18/lib/python3.9/site-packages/selenium/webdriver/support/expected_conditions.py", line 64, in __call__
            return _find_element(driver, self.locator)
          File "/pyenv/data/versions/3.9.18/lib/python3.9/site-packages/selenium/webdriver/support/expected_conditions.py", line 411, in _find_element
            return driver.find_element(*by)
          File "/pyenv/data/versions/3.9.18/lib/python3.9/site-packages/selenium/webdriver/remote/webdriver.py", line 976, in find_element
            return self.execute(Command.FIND_ELEMENT, {
          File "/pyenv/data/versions/3.9.18/lib/python3.9/site-packages/selenium/webdriver/remote/webdriver.py", line 319, in execute
            response = self.command_executor.execute(driver_command, params)
          File "/pyenv/data/versions/3.9.18/lib/python3.9/site-packages/selenium/webdriver/remote/remote_connection.py", line 374, in execute
            return self._request(command_info[0], url, body=data)
          File "/pyenv/data/versions/3.9.18/lib/python3.9/site-packages/selenium/webdriver/remote/remote_connection.py", line 397, in _request
            resp = self._conn.request(method, url, body=body, headers=headers)
          File "/pyenv/data/versions/3.9.18/lib/python3.9/site-packages/urllib3/request.py", line 81, in request
            return self.request_encode_body(
          File "/pyenv/data/versions/3.9.18/lib/python3.9/site-packages/urllib3/request.py", line 173, in request_encode_body
            return self.urlopen(method, url, **extra_kw)
          File "/pyenv/data/versions/3.9.18/lib/python3.9/site-packages/urllib3/poolmanager.py", line 376, in urlopen
            response = conn.urlopen(method, u.request_uri, **kw)
          File "/pyenv/data/versions/3.9.18/lib/python3.9/site-packages/urllib3/connectionpool.py", line 715, in urlopen
            httplib_response = self._make_request(
          File "/pyenv/data/versions/3.9.18/lib/python3.9/site-packages/urllib3/connectionpool.py", line 467, in _make_request
            six.raise_from(e, None)
          File "<string>", line 3, in raise_from
          File "/pyenv/data/versions/3.9.18/lib/python3.9/site-packages/urllib3/connectionpool.py", line 462, in _make_request
            httplib_response = conn.getresponse()
          File "/pyenv/data/versions/3.9.18/lib/python3.9/http/client.py", line 1377, in getresponse
            response.begin()
          File "/pyenv/data/versions/3.9.18/lib/python3.9/http/client.py", line 339, in begin
            self.headers = self.msg = parse_headers(self.fp)
          File "/pyenv/data/versions/3.9.18/lib/python3.9/http/client.py", line 238, in parse_headers
            return email.parser.Parser(_class=_class).parsestr(hstring)
          File "/pyenv/data/versions/3.9.18/lib/python3.9/email/parser.py", line 67, in parsestr
            return self.parse(StringIO(text), headersonly=headersonly)
          File "/pyenv/data/versions/3.9.18/lib/python3.9/email/parser.py", line 56, in parse
            feedparser.feed(data)
          File "/pyenv/data/versions/3.9.18/lib/python3.9/email/feedparser.py", line 176, in feed
            self._call_parse()
          File "/pyenv/data/versions/3.9.18/lib/python3.9/email/feedparser.py", line 180, in _call_parse
            self._parse()
          File "/pyenv/data/versions/3.9.18/lib/python3.9/email/feedparser.py", line 295, in _parsegen
            if self._cur.get_content_maintype() == 'message':
          File "/pyenv/data/versions/3.9.18/lib/python3.9/email/message.py", line 594, in get_content_maintype
            ctype = self.get_content_type()
          File "/pyenv/data/versions/3.9.18/lib/python3.9/email/message.py", line 578, in get_content_type
            value = self.get('content-type', missing)
          File "/pyenv/data/versions/3.9.18/lib/python3.9/email/message.py", line 471, in get
            return self.policy.header_fetch_parse(k, v)
          File "/pyenv/data/versions/3.9.18/lib/python3.9/email/_policybase.py", line 316, in header_fetch_parse
            return self._sanitize_header(name, value)
          File "/pyenv/data/versions/3.9.18/lib/python3.9/email/_policybase.py", line 287, in _sanitize_header
            if _has_surrogates(value):
          File "/pyenv/data/versions/3.9.18/lib/python3.9/email/utils.py", line 57, in _has_surrogates
            s.encode()
        RecursionError: maximum recursion depth exceeded while calling a Python object
    
    And the MCPClient log always ends with the following traceback:
    >
    archivematicaClient.py: DEBUG     2024-02-05 11:53:23,630  archivematica.mcp.client.gearman:handle_job:95:  Gearman job request 70b7759a-cf9f-49b3-b959-b34873574773 received for assignfileuuids_v0.0
    archivematicaClient.py: INFO      2024-02-05 11:53:23,630  archivematica.mcp.client.worker:run_task:14:  
    
    *** RUNNING TASK: assignfileuuids_v0.0***
    archivematicaClient.py: INFO      2024-02-05 11:53:23,678  archivematica.mcp.client.job:log_results:62:  #<assignfileuuids_v0.0; exit=0; code=success uuid=9248b253-c248-4ae2-99e6-21cdad62e1de
    =============== STDOUT ===============
    Generated UUID for file 30cf66ef-4262-4b7a-bb90-0e7c51de51c7
    Generated UUID for file cfe4d0ac-112f-4dd5-90b8-db18b75cb503
    Generated UUID for file a2c42356-e3a3-4fd6-b595-6472f9dff71a
    Generated UUID for file f5ae0acf-92b9-4657-8492-a6d718ce97d8
    Generated UUID for file 0e1e7adb-085c-47a1-a1bd-ff338973a3da
    Generated UUID for file aae21c31-a0e2-4734-ac56-9231c945a520
    
    =============== END STDOUT ===============
    =============== STDERR ===============
    METS file not found: [Errno 17] No METS file found in /var/archivematica/sharedDirectory/currentlyProcessing/BagTransfer-cb84fc9e-9481-4bbe-89e1-65389a80a75e/metadata
    
    =============== END STDERR ===============
    
    >
    archivematicaClient.py: DEBUG     2024-02-05 11:53:23,689  archivematica.mcp.client.gearman:handle_job:95:  Gearman job request d1688152-bd56-4faa-a9b0-2ef1c95b8188 received for updatesizeandchecksum_v0.0
    archivematicaClient.py: INFO      2024-02-05 11:53:23,690  archivematica.mcp.client.worker:run_task:14:  
    
    *** RUNNING TASK: updatesizeandchecksum_v0.0***
    archivematicaClient.py: INFO      2024-02-05 11:53:23,741  archivematica.mcp.client.job:log_results:62:  #<updatesizeandchecksum_v0.0; exit=0; code=success uuid=cd1fce33-d9f3-49aa-b83c-dec11511d48f
    =============== STDOUT ===============
    
    =============== END STDOUT ===============
    =============== STDERR ===============
    METS file not found: [Errno 17] No METS file found in /var/archivematica/sharedDirectory/currentlyProcessing/BagTransfer-cb84fc9e-9481-4bbe-89e1-65389a80a75e/metadata
    
    =============== END STDERR ===============
    
    >
    archivematicaClient.py: DEBUG     2024-02-05 11:53:23,755  archivematica.mcp.client.gearman:handle_job:95:  Gearman job request f5bc9a79-52a0-47c8-8fb4-8507030bcd4d received for verifyandrestructuretransferbag_v0.0
    archivematicaClient.py: INFO      2024-02-05 11:53:23,756  archivematica.mcp.client.worker:run_task:14:  
    
    *** RUNNING TASK: verifyandrestructuretransferbag_v0.0***
    archivematicaClient.py: INFO      2024-02-05 11:53:23,780  archivematica.mcp.client:signal_handler:22:  Received termination signal (15)
    Process ForkPoolWorker-13:4:
    Traceback (most recent call last):
      File "/pyenv/data/versions/3.9.18/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
        self.run()
      File "/pyenv/data/versions/3.9.18/lib/python3.9/multiprocessing/process.py", line 108, in run
        self._target(*self._args, **self._kwargs)
      File "/pyenv/data/versions/3.9.18/lib/python3.9/multiprocessing/pool.py", line 114, in worker
        task = get()
      File "/pyenv/data/versions/3.9.18/lib/python3.9/multiprocessing/queues.py", line 367, in get
        return _ForkingPickler.loads(res)
      File "/src/src/MCPClient/lib/client/mcp.py", line 23, in signal_handler
        pool.stop()
      File "/src/src/MCPClient/lib/client/pool.py", line 103, in stop
        self.pool_maintainance_thread.join()
      File "/pyenv/data/versions/3.9.18/lib/python3.9/threading.py", line 1057, in join
        raise RuntimeError("cannot join current thread")
    RuntimeError: cannot join current thread
    moving:  /var/archivematica/sharedDirectory/currentlyProcessing/BagTransfer-cb84fc9e-9481-4bbe-89e1-65389a80a75e/data/bagTest /var/archivematica/sharedDirectory/currentlyProcessing/BagTransfer-cb84fc9e-9481-4bbe-89e1-65389a80a75e/objects/bagTest
    archivematicaClient.py: INFO      2024-02-05 11:53:23,813  archivematica.mcp.client.job:log_results:62:  #<verifyandrestructuretransferbag_v0.0; exit=0; code= uuid=ee60b187-5616-4eac-8a7e-a06742b4b1d5
    =============== STDOUT ===============
    creating:  logs
    creating:  logs/fileMeta
    creating:  metadata
    creating:  metadata/submissionDocumentation
    creating:  objects
    creating:  /var/archivematica/sharedDirectory/currentlyProcessing/BagTransfer-cb84fc9e-9481-4bbe-89e1-65389a80a75e/logs/BagIt
    Moving /var/archivematica/sharedDirectory/currentlyProcessing/BagTransfer-cb84fc9e-9481-4bbe-89e1-65389a80a75e/bag-info.txt to /var/archivematica/sharedDirectory/currentlyProcessing/BagTransfer-cb84fc9e-9481-4bbe-89e1-65389a80a75e/logs/BagIt/bag-info.txt
    Moving /var/archivematica/sharedDirectory/currentlyProcessing/BagTransfer-cb84fc9e-9481-4bbe-89e1-65389a80a75e/manifest-sha512.txt to /var/archivematica/sharedDirectory/currentlyProcessing/BagTransfer-cb84fc9e-9481-4bbe-89e1-65389a80a75e/metadata/manifest-sha512.txt
    Moving /var/archivematica/sharedDirectory/currentlyProcessing/BagTransfer-cb84fc9e-9481-4bbe-89e1-65389a80a75e/bagit.txt to /var/archivematica/sharedDirectory/currentlyProcessing/BagTransfer-cb84fc9e-9481-4bbe-89e1-65389a80a75e/logs/BagIt/bagit.txt
    moving directory to objects:  bagTest
    removing empty data directory
    
    =============== END STDOUT ===============
    =============== STDERR ===============
    
    =============== END STDERR ===============
    
    >
    archivematicaClient.py: INFO      2024-02-05 11:53:23,815  archivematica.mcp.client.gearman:after_poll:124:  Gearman Worker exited due to shutdown
    archivematicaClient.py: DEBUG     2024-02-05 11:53:23,816  archivematica.mcp.client:run_gearman_worker:61:  Worker process 215 exiting
    archivematicaClient.py: INFO      2024-02-05 11:54:03,982  archivematica.mcp.client.gearman:after_poll:124:  Gearman Worker exited due to shutdown
    archivematicaClient.py: INFO      2024-02-05 11:54:03,982  archivematica.mcp.client.gearman:after_poll:124:  Gearman Worker exited due to shutdown
    archivematicaClient.py: DEBUG     2024-02-05 11:54:03,982  archivematica.mcp.client:run_gearman_worker:61:  Worker process 88 exiting
    archivematicaClient.py: DEBUG     2024-02-05 11:54:03,982  archivematica.mcp.client:run_gearman_worker:61:  Worker process 89 exiting
    archivematicaClient.py: INFO      2024-02-05 11:54:03,992  archivematica.mcp.client.gearman:after_poll:124:  Gearman Worker exited due to shutdown
    archivematicaClient.py: DEBUG     2024-02-05 11:54:03,992  archivematica.mcp.client:run_gearman_worker:61:  Worker process 91 exiting
    

Jenkins

This is how a matrix-qa run from 2024-02-05 looked like:

Image

From a run of the metadata-xml feature file in Ubuntu 22.04 the MCPClient log ends with:

Feb 05 14:59:54 ubuntu2204-20 python[46918]: archivematicaClient.py: DEBUG     2024-02-05 14:59:54,185  archivematica.mcp.client.gearman:handle_job:95:  Gearman job request 50e81f36-20b1-4a14-b4dc-fc8591637ace received for verifyandrestructuretransferbag_v0.0
Feb 05 14:59:54 ubuntu2204-20 python[46918]: archivematicaClient.py: INFO      2024-02-05 14:59:54,187  archivematica.mcp.client.worker:run_task:14:
Feb 05 14:59:54 ubuntu2204-20 python[46918]: *** RUNNING TASK: verifyandrestructuretransferbag_v0.0***
Feb 05 14:59:54 ubuntu2204-20 python[46928]: archivematicaClient.py: INFO      2024-02-05 14:59:54,247  archivematica.mcp.client:signal_handler:22:  Received termination signal (15)
Feb 05 14:59:54 ubuntu2204-20 python[46918]: archivematicaClient.py: INFO      2024-02-05 14:59:54,684  archivematica.mcp.client.job:log_results:62:  #<verifyandrestructuretransferbag_v0.0; exit=0; code= uuid=8c1ef022-442f-47ed-8c58-5c7fea149a80
Feb 05 14:59:54 ubuntu2204-20 python[46918]: =============== STDOUT ===============
Feb 05 14:59:54 ubuntu2204-20 python[46918]: creating:  logs
Feb 05 14:59:54 ubuntu2204-20 python[46918]: creating:  logs/fileMeta
Feb 05 14:59:54 ubuntu2204-20 python[46918]: moving directory  metadata
Feb 05 14:59:54 ubuntu2204-20 python[46918]: creating:  metadata/submissionDocumentation
Feb 05 14:59:54 ubuntu2204-20 python[46918]: creating:  objects
Feb 05 14:59:54 ubuntu2204-20 python[46918]: creating:  /var/archivematica/sharedDirectory/currentlyProcessing/amauat-transfer_1707145192-4a128ddc-ef65-4396-ba1e-3fc248d196d7/logs/BagIt
Feb 05 14:59:54 ubuntu2204-20 python[46918]: Moving /var/archivematica/sharedDirectory/currentlyProcessing/amauat-transfer_1707145192-4a128ddc-ef65-4396-ba1e-3fc248d196d7/bagit.txt to /var/archivematica/sharedDirectory/currentlyProcessing/amauat-transfer_1707145192-4a128ddc-ef65-4396-ba1e-3fc248d196d7/logs/BagIt/bagit.txt
Feb 05 14:59:54 ubuntu2204-20 python[46918]: Moving /var/archivematica/sharedDirectory/currentlyProcessing/amauat-transfer_1707145192-4a128ddc-ef65-4396-ba1e-3fc248d196d7/manifest-md5.txt to /var/archivematica/sharedDirectory/currentlyProcessing/amauat-transfer_1707145192-4a128ddc-ef65-4396-ba1e-3fc248d196d7/metadata/manifest-md5.txt
Feb 05 14:59:54 ubuntu2204-20 python[46918]: Moving /var/archivematica/sharedDirectory/currentlyProcessing/amauat-transfer_1707145192-4a128ddc-ef65-4396-ba1e-3fc248d196d7/manifest-sha512.txt to /var/archivematica/sharedDirectory/currentlyProcessing/amauat-transfer_1707145192-4a128ddc-ef65-4396-ba1e-3fc248d196d7/metadata/manifest-sha512.txt
Feb 05 14:59:54 ubuntu2204-20 python[46918]: Moving /var/archivematica/sharedDirectory/currentlyProcessing/amauat-transfer_1707145192-4a128ddc-ef65-4396-ba1e-3fc248d196d7/tagmanifest-md5.txt to /var/archivematica/sharedDirectory/currentlyProcessing/amauat-transfer_1707145192-4a128ddc-ef65-4396-ba1e-3fc248d196d7/logs/BagIt/tagmanifest-md5.txt
Feb 05 14:59:54 ubuntu2204-20 python[46918]: Moving /var/archivematica/sharedDirectory/currentlyProcessing/amauat-transfer_1707145192-4a128ddc-ef65-4396-ba1e-3fc248d196d7/bag-info.txt to /var/archivematica/sharedDirectory/currentlyProcessing/amauat-transfer_1707145192-4a128ddc-ef65-4396-ba1e-3fc248d196d7/logs/BagIt/bag-info.txt
Feb 05 14:59:54 ubuntu2204-20 python[46918]: Moving /var/archivematica/sharedDirectory/currentlyProcessing/amauat-transfer_1707145192-4a128ddc-ef65-4396-ba1e-3fc248d196d7/tagmanifest-sha512.txt to /var/archivematica/sharedDirectory/currentlyProcessing/amauat-transfer_1707145192-4a128ddc-ef65-4396-ba1e-3fc248d196d7/logs/BagIt/tagmanifest-sha512.txt
Feb 05 14:59:54 ubuntu2204-20 python[46918]: moving file to objects:  RiesTaunA_1666408611-19330529_mets.xml
Feb 05 14:59:54 ubuntu2204-20 python[46918]: Moving /var/archivematica/sharedDirectory/currentlyProcessing/amauat-transfer_1707145192-4a128ddc-ef65-4396-ba1e-3fc248d196d7/data/RiesTaunA_1666408611-19330529_mets.xml to /var/archivematica/sharedDirectory/currentlyProcessing/amauat-transfer_1707145192-4a128ddc-ef65-4396-ba1e-3fc248d196d7/objects/RiesTaunA_1666408611-19330529_mets.xml
Feb 05 14:59:54 ubuntu2204-20 python[46918]: moving file to objects:  RiesTaunA_1666408611-19330529_year.xml
Feb 05 14:59:54 ubuntu2204-20 python[46918]: Moving /var/archivematica/sharedDirectory/currentlyProcessing/amauat-transfer_1707145192-4a128ddc-ef65-4396-ba1e-3fc248d196d7/data/RiesTaunA_1666408611-19330529_year.xml to /var/archivematica/sharedDirectory/currentlyProcessing/amauat-transfer_1707145192-4a128ddc-ef65-4396-ba1e-3fc248d196d7/objects/RiesTaunA_1666408611-19330529_year.xml
Feb 05 14:59:54 ubuntu2204-20 python[46918]: moving directory to objects:  images
Feb 05 14:59:54 ubuntu2204-20 python[46918]: moving directory to objects:  ocr
Feb 05 14:59:54 ubuntu2204-20 python[46918]: moving file to objects:  RiesTaunA_1666408611-19330529_anchor.xml
Feb 05 14:59:54 ubuntu2204-20 python[46918]: Moving /var/archivematica/sharedDirectory/currentlyProcessing/amauat-transfer_1707145192-4a128ddc-ef65-4396-ba1e-3fc248d196d7/data/RiesTaunA_1666408611-19330529_anchor.xml to /var/archivematica/sharedDirectory/currentlyProcessing/amauat-transfer_1707145192-4a128ddc-ef65-4396-ba1e-3fc248d196d7/objects/RiesTaunA_1666408611-19330529_anchor.xml
Feb 05 14:59:54 ubuntu2204-20 python[46918]: removing empty data directory
Feb 05 14:59:54 ubuntu2204-20 python[46918]: =============== END STDOUT ===============
Feb 05 14:59:54 ubuntu2204-20 python[46918]: =============== STDERR ===============
Feb 05 14:59:54 ubuntu2204-20 python[46918]: =============== END STDERR ===============
Feb 05 14:59:54 ubuntu2204-20 python[46918]: >
Feb 05 14:59:54 ubuntu2204-20 python[46918]: archivematicaClient.py: INFO      2024-02-05 14:59:54,690  archivematica.mcp.client.gearman:after_poll:124:  Gearman Worker exited due to shutdown
Feb 05 14:59:54 ubuntu2204-20 python[46918]: archivematicaClient.py: DEBUG     2024-02-05 14:59:54,691  archivematica.mcp.client:run_gearman_worker:61:  Worker process 46918 exiting
Feb 05 14:59:54 ubuntu2204-20 python[46918]: moving:  /var/archivematica/sharedDirectory/currentlyProcessing/amauat-transfer_1707145192-4a128ddc-ef65-4396-ba1e-3fc248d196d7/data/metadata /var/archivematica/sharedDirectory/currentlyProcessing/amauat-transfer_1707145192-4a128ddc-ef65-4396-ba1e-3fc248d196d7/metadata
Feb 05 14:59:54 ubuntu2204-20 python[46918]: moving:  /var/archivematica/sharedDirectory/currentlyProcessing/amauat-transfer_1707145192-4a128ddc-ef65-4396-ba1e-3fc248d196d7/data/images /var/archivematica/sharedDirectory/currentlyProcessing/amauat-transfer_1707145192-4a128ddc-ef65-4396-ba1e-3fc248d196d7/objects/images
Feb 05 14:59:54 ubuntu2204-20 python[46918]: moving:  /var/archivematica/sharedDirectory/currentlyProcessing/amauat-transfer_1707145192-4a128ddc-ef65-4396-ba1e-3fc248d196d7/data/ocr /var/archivematica/sharedDirectory/currentlyProcessing/amauat-transfer_1707145192-4a128ddc-ef65-4396-ba1e-3fc248d196d7/objects/ocr
Feb 05 15:00:16 ubuntu2204-20 python[40900]: archivematicaClient.py: INFO      2024-02-05 15:00:16,276  archivematica.mcp.client.gearman:after_poll:124:  Gearman Worker exited due to shutdown
Feb 05 15:00:16 ubuntu2204-20 python[40900]: archivematicaClient.py: DEBUG     2024-02-05 15:00:16,278  archivematica.mcp.client:run_gearman_worker:61:  Worker process 40900 exiting
Feb 05 15:00:30 ubuntu2204-20 python[44057]: archivematicaClient.py: INFO      2024-02-05 15:00:30,582  archivematica.mcp.client.gearman:after_poll:124:  Gearman Worker exited due to shutdown
Feb 05 15:00:30 ubuntu2204-20 python[44057]: archivematicaClient.py: DEBUG     2024-02-05 15:00:30,583  archivematica.mcp.client:run_gearman_worker:61:  Worker process 44057 exiting
Feb 05 15:00:30 ubuntu2204-20 python[44057]: <?xml version="1.0" encoding="UTF-8"?>
Feb 05 15:00:30 ubuntu2204-20 python[44057]: <MediaInfo
Feb 05 15:00:30 ubuntu2204-20 python[44057]:     xmlns="https://mediaarea.net/mediainfo"
Feb 05 15:00:30 ubuntu2204-20 python[44057]:     xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
Feb 05 15:00:30 ubuntu2204-20 python[44057]:     xsi:schemaLocation="https://mediaarea.net/mediainfo https://mediaarea.net/mediainfo/mediainfo_2_0.xsd"
Feb 05 15:00:30 ubuntu2204-20 python[44057]:     version="2.0">
Feb 05 15:00:30 ubuntu2204-20 python[44057]: <creatingLibrary version="24.01" url="https://mediaarea.net/MediaInfo">MediaInfoLib</creatingLibrary>
Feb 05 15:00:30 ubuntu2204-20 python[44057]: <media ref="/var/archivematica/sharedDirectory/watchedDirectories/workFlowDecisions/extractPackagesChoice/amauat-transfer_1707144963-b1d7c287-6909-42bd-86c0-94953efd0fc5/objects/images/scans_tif/00000012.tif">
Feb 05 15:00:30 ubuntu2204-20 python[44057]: <track type="General">
Feb 05 15:00:30 ubuntu2204-20 python[44057]: <Count>349</Count>
Feb 05 15:00:30 ubuntu2204-20 python[44057]: <StreamCount>1</StreamCount>
Feb 05 15:00:30 ubuntu2204-20 python[44057]: <StreamKind>General</StreamKind>
Feb 05 15:00:30 ubuntu2204-20 python[44057]: <StreamKind_String>General</StreamKind_String>
Feb 05 15:00:30 ubuntu2204-20 python[44057]: <StreamKindID>0</StreamKindID>
Feb 05 15:00:30 ubuntu2204-20 python[44057]: <ImageCount>1</ImageCount>
Feb 05 15:00:30 ubuntu2204-20 python[44057]: <Image_Format_List>Raw</Image_Format_List>
Feb 05 15:00:30 ubuntu2204-20 python[44057]: <Image_Format_WithHint_List>Raw</Image_Format_WithHint_List>
Feb 05 15:00:30 ubuntu2204-20 python[44057]: <Image_Codec_List>Raw</Image_Codec_List>
Feb 05 15:00:30 ubuntu2204-20 python[44057]: <CompleteName>/var/archivematica/sharedDirectory/watchedDirectories/workFlowDecisions/extractPackagesChoice/amauat-transfer_1707144963-b1d7c287-6909-42bd-86c0-94953efd0fc5/objects/images/scans_tif/00000012.tif</CompleteName>
Feb 05 15:00:30 ubuntu2204-20 python[44057]: <FolderName>/var/archivematica/sharedDirectory/watchedDirectories/workFlowDecisions/extractPackagesChoice/amauat-transfer_1707144963-b1d7c287-6909-42bd-86c0-94953efd0fc5/objects/images/scans_tif</FolderName>
Feb 05 15:00:30 ubuntu2204-20 python[44057]: <FileNameExtension>00000012.tif</FileNameExtension>
Feb 05 15:00:30 ubuntu2204-20 python[44057]: <FileName>00000012</FileName>
Feb 05 15:00:30 ubuntu2204-20 python[44057]: <FileExtension>tif</FileExtension>
Feb 05 15:00:30 ubuntu2204-20 python[44057]: <Format>TIFF</Format>
Feb 05 15:00:30 ubuntu2204-20 python[44057]: <Format_String>TIFF</Format_String>
Feb 05 15:00:30 ubuntu2204-20 python[44057]: <Format_Extensions>tiff tif</Format_Extensions>
Feb 05 15:00:30 ubuntu2204-20 python[44057]: <Format_Commercial>TIFF</Format_Commercial>
Feb 05 15:00:30 ubuntu2204-20 python[44057]: <InternetMediaType>image/tiff</InternetMediaType>
Feb 05 15:00:30 ubuntu2204-20 python[44057]: <FileSize>1738</FileSize>
Feb 05 15:00:30 ubuntu2204-20 python[44057]: <FileSize_String>1.70 KiB</FileSize_String>
Feb 05 15:00:30 ubuntu2204-20 python[44057]: <FileSize_String1>2 KiB</FileSize_String1>
Feb 05 15:00:30 ubuntu2204-20 python[44057]: <FileSize_String2>1.7 KiB</FileSize_String2>
Feb 05 15:00:30 ubuntu2204-20 python[44057]: <FileSize_String3>1.70 KiB</FileSize_String3>
Feb 05 15:00:30 ubuntu2204-20 python[44057]: <FileSize_String4>1.697 KiB</FileSize_String4>
Feb 05 15:00:30 ubuntu2204-20 python[44057]: <File_Modified_Date>2024-02-05 14:39:52 UTC</File_Modified_Date>
Feb 05 15:00:30 ubuntu2204-20 python[44057]: <File_Modified_Date_Local>2024-02-05 14:39:52</File_Modified_Date_Local>
Feb 05 15:00:30 ubuntu2204-20 python[44057]: <Encoded_Application_String>GIMP 2.10.18</Encoded_Application_String>
Feb 05 15:00:30 ubuntu2204-20 python[44057]: <Encoded_Application_Name>GIMP 2.10.18</Encoded_Application_Name>
Feb 05 15:00:30 ubuntu2204-20 python[44057]: </track>
Feb 05 15:00:30 ubuntu2204-20 python[44057]: <track type="Image">
Feb 05 15:00:30 ubuntu2204-20 python[44057]: <Count>171</Count>
Feb 05 15:00:30 ubuntu2204-20 python[44057]: <StreamCount>1</StreamCount>
Feb 05 15:00:30 ubuntu2204-20 python[44057]: <StreamKind>Image</StreamKind>
Feb 05 15:00:30 ubuntu2204-20 python[44057]: <StreamKind_String>Image</StreamKind_String>
Feb 05 15:00:30 ubuntu2204-20 python[44057]: <StreamKindID>0</StreamKindID>
Feb 05 15:00:30 ubuntu2204-20 python[44057]: <Title>/home/steidl/Development/submissionapplications4rosetta/examples/bagit/valid_SLUB_bagits/small/data/12.broken.tif</Title>
Feb 05 15:00:30 ubuntu2204-20 python[44057]: <Format>Raw</Format>
Feb 05 15:00:30 ubuntu2204-20 python[44057]: <Format_String>Raw</Format_String>
Feb 05 15:00:30 ubuntu2204-20 python[44057]: <Format_Commercial>Raw</Format_Commercial>
Feb 05 15:00:30 ubuntu2204-20 python[44057]: <Format_Settings>Little</Format_Settings>
Feb 05 15:00:30 ubuntu2204-20 python[44057]: <Format_Settings_Endianness>Little</Format_Settings_Endianness>
Feb 05 15:00:30 ubuntu2204-20 python[44057]: <Width>20</Width>
Feb 05 15:00:30 ubuntu2204-20 python[44057]: <Width_String>20 pixels</Width_String>
Feb 05 15:00:30 ubuntu2204-20 python[44057]: <Height>10</Height>
Feb 05 15:00:30 ubuntu2204-20 python[44057]: <Height_String>10 pixels</Height_String>
Feb 05 15:00:30 ubuntu2204-20 python[44057]: <ColorSpace>RGB</ColorSpace>
Feb 05 15:00:30 ubuntu2204-20 python[44057]: <BitDepth>8</BitDepth>
Feb 05 15:00:30 ubuntu2204-20 python[44057]: <BitDepth_String>8 bits</BitDepth_String>
Feb 05 15:00:30 ubuntu2204-20 python[44057]: <Compression_Mode>Lossless</Compression_Mode>
Feb 05 15:00:30 ubuntu2204-20 python[44057]: <Compression_Mode_String>Lossless</Compression_Mode_String>
Feb 05 15:00:30 ubuntu2204-20 python[44057]: <Encoded_Date>2020:06:15 15:16:32</Encoded_Date>
Feb 05 15:00:30 ubuntu2204-20 python[44057]: <extra>
Feb 05 15:00:30 ubuntu2204-20 python[44057]: <Density_X>376.193</Density_X>
Feb 05 15:00:30 ubuntu2204-20 python[44057]: <Density_Y>376.193</Density_Y>
Feb 05 15:00:30 ubuntu2204-20 python[44057]: <Density_Unit>dpi</Density_Unit>
Feb 05 15:00:30 ubuntu2204-20 python[44057]: <Density_String>376.193 dpi</Density_String>
Feb 05 15:00:30 ubuntu2204-20 python[44057]: <ColorSpace_ICC>RGB</ColorSpace_ICC>
Feb 05 15:00:30 ubuntu2204-20 python[44057]: </extra>
Feb 05 15:00:30 ubuntu2204-20 python[44057]: </track>
Feb 05 15:00:30 ubuntu2204-20 python[44057]: </media>
Feb 05 15:00:30 ubuntu2204-20 python[44057]: </MediaInfo>
Feb 05 15:00:33 ubuntu2204-20 python[46860]: archivematicaClient.py: INFO      2024-02-05 15:00:33,931  archivematica.mcp.client.gearman:after_poll:124:  Gearman Worker exited due to shutdown
Feb 05 15:00:33 ubuntu2204-20 python[46860]: archivematicaClient.py: DEBUG     2024-02-05 15:00:33,932  archivematica.mcp.client:run_gearman_worker:61:  Worker process 46860 exiting

And a run of the uuids-dirs in Rocky 9 shows this in the MCPClient log:

Feb 05 14:53:51 rocky9-21 python[75295]: archivematicaClient.py: DEBUG     2024-02-05 14:53:51,443  archivematica.mcp.client.gearman:handle_job:95:  Gearman job request 44f96d1a-c262-4749-8a52-794acd5c7d5b received for updatesizeandchecksum_v0.0
Feb 05 14:53:51 rocky9-21 python[75295]: archivematicaClient.py: INFO      2024-02-05 14:53:51,446  archivematica.mcp.client.worker:run_task:14:
Feb 05 14:53:51 rocky9-21 python[75295]: *** RUNNING TASK: updatesizeandchecksum_v0.0***
Feb 05 14:53:51 rocky9-21 python[75295]: archivematicaClient.py: INFO      2024-02-05 14:53:51,526  archivematica.mcp.client.job:log_results:62:  #<updatesizeandchecksum_v0.0; exit=0; code=success uuid=dfd92a1e-3e2d-4905-b83b-9bf4ac8a4048
Feb 05 14:53:51 rocky9-21 python[75295]: =============== STDOUT ===============
Feb 05 14:53:51 rocky9-21 python[75295]: =============== END STDOUT ===============
Feb 05 14:53:51 rocky9-21 python[75295]: =============== STDERR ===============
Feb 05 14:53:51 rocky9-21 python[75295]: METS file not found: [Errno 17] No METS file found in /var/archivematica/sharedDirectory/currentlyProcessing/BagTransfer-5827114d-0b71-436a-a63a-4b048b221a8a/metadata
Feb 05 14:53:51 rocky9-21 python[75295]: =============== END STDERR ===============
Feb 05 14:53:51 rocky9-21 python[75295]: >
Feb 05 14:53:51 rocky9-21 python[75295]: archivematicaClient.py: DEBUG     2024-02-05 14:53:51,547  archivematica.mcp.client.gearman:handle_job:95:  Gearman job request 70d1cd95-3ceb-4bea-aacb-d8688cfde03c received for verifyandrestructuretransferbag_v0.0
Feb 05 14:53:51 rocky9-21 python[75295]: archivematicaClient.py: INFO      2024-02-05 14:53:51,550  archivematica.mcp.client.worker:run_task:14:
Feb 05 14:53:51 rocky9-21 python[75295]: *** RUNNING TASK: verifyandrestructuretransferbag_v0.0***
Feb 05 14:53:51 rocky9-21 python[75371]: archivematicaClient.py: INFO      2024-02-05 14:53:51,589  archivematica.mcp.client:signal_handler:22:  Received termination signal (15)
Feb 05 14:53:51 rocky9-21 python[75295]: archivematicaClient.py: INFO      2024-02-05 14:53:51,636  archivematica.mcp.client.job:log_results:62:  #<verifyandrestructuretransferbag_v0.0; exit=0; code= uuid=7ce93e15-6054-4fdb-a18f-9c600fcbefa3
Feb 05 14:53:51 rocky9-21 python[75295]: =============== STDOUT ===============
Feb 05 14:53:51 rocky9-21 python[75295]: creating:  logs
Feb 05 14:53:51 rocky9-21 python[75295]: creating:  logs/fileMeta
Feb 05 14:53:51 rocky9-21 python[75295]: creating:  metadata
Feb 05 14:53:51 rocky9-21 python[75295]: creating:  metadata/submissionDocumentation
Feb 05 14:53:51 rocky9-21 python[75295]: creating:  objects
Feb 05 14:53:51 rocky9-21 python[75295]: creating:  /var/archivematica/sharedDirectory/currentlyProcessing/BagTransfer-5827114d-0b71-436a-a63a-4b048b221a8a/logs/BagIt
Feb 05 14:53:51 rocky9-21 python[75295]: Moving /var/archivematica/sharedDirectory/currentlyProcessing/BagTransfer-5827114d-0b71-436a-a63a-4b048b221a8a/bag-info.txt to /var/archivematica/sharedDirectory/currentlyProcessing/BagTransfer-5827114d-0b71-436a-a63a-4b048b221a8a/logs/BagIt/bag-info.txt
Feb 05 14:53:51 rocky9-21 python[75295]: Moving /var/archivematica/sharedDirectory/currentlyProcessing/BagTransfer-5827114d-0b71-436a-a63a-4b048b221a8a/bagit.txt to /var/archivematica/sharedDirectory/currentlyProcessing/BagTransfer-5827114d-0b71-436a-a63a-4b048b221a8a/logs/BagIt/bagit.txt
Feb 05 14:53:51 rocky9-21 python[75295]: Moving /var/archivematica/sharedDirectory/currentlyProcessing/BagTransfer-5827114d-0b71-436a-a63a-4b048b221a8a/manifest-sha512.txt to /var/archivematica/sharedDirectory/currentlyProcessing/BagTransfer-5827114d-0b71-436a-a63a-4b048b221a8a/metadata/manifest-sha512.txt
Feb 05 14:53:51 rocky9-21 python[75295]: moving directory to objects:  bagTest
Feb 05 14:53:51 rocky9-21 python[75295]: removing empty data directory
Feb 05 14:53:51 rocky9-21 python[75295]: =============== END STDOUT ===============
Feb 05 14:53:51 rocky9-21 python[75295]: =============== STDERR ===============
Feb 05 14:53:51 rocky9-21 python[75295]: =============== END STDERR ===============
Feb 05 14:53:51 rocky9-21 python[75295]: >
Feb 05 14:53:51 rocky9-21 python[75295]: archivematicaClient.py: INFO      2024-02-05 14:53:51,641  archivematica.mcp.client.gearman:after_poll:124:  Gearman Worker exited due to shutdown
Feb 05 14:53:51 rocky9-21 python[75295]: archivematicaClient.py: DEBUG     2024-02-05 14:53:51,642  archivematica.mcp.client:run_gearman_worker:61:  Worker process 75295 exiting
Feb 05 14:54:39 rocky9-21 python[72665]: archivematicaClient.py: INFO      2024-02-05 14:54:39,991  archivematica.mcp.client.gearman:after_poll:124:  Gearman Worker exited due to shutdown
Feb 05 14:54:39 rocky9-21 python[72665]: archivematicaClient.py: DEBUG     2024-02-05 14:54:39,993  archivematica.mcp.client:run_gearman_worker:61:  Worker process 72665 exiting
Feb 05 14:54:40 rocky9-21 python[72663]: archivematicaClient.py: INFO      2024-02-05 14:54:40,016  archivematica.mcp.client.gearman:after_poll:124:  Gearman Worker exited due to shutdown
Feb 05 14:54:40 rocky9-21 python[72663]: archivematicaClient.py: DEBUG     2024-02-05 14:54:40,017  archivematica.mcp.client:run_gearman_worker:61:  Worker process 72663 exiting
Feb 05 14:54:40 rocky9-21 python[72664]: archivematicaClient.py: INFO      2024-02-05 14:54:40,027  archivematica.mcp.client.gearman:after_poll:124:  Gearman Worker exited due to shutdown
Feb 05 14:54:40 rocky9-21 python[72664]: archivematicaClient.py: DEBUG     2024-02-05 14:54:40,028  archivematica.mcp.client:run_gearman_worker:61:  Worker process 72664 exiting

It seems the verifyandrestructuretransferbag_v0.0 job might be a common denominator in this problem.

@replaceafill
Copy link
Member

The verifyandrestructuretransferbag_v0.0 job causes the new MCPClient WorkerPool to exit:

*** RUNNING TASK: verifyandrestructuretransferbag_v0.0***
archivematicaClient.py: INFO      2024-02-05 11:53:23,780  archivematica.mcp.client:signal_handler:22:  Received termination signal (15)
Process ForkPoolWorker-13:4:
Traceback (most recent call last):
  File "/pyenv/data/versions/3.9.18/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/pyenv/data/versions/3.9.18/lib/python3.9/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/pyenv/data/versions/3.9.18/lib/python3.9/multiprocessing/pool.py", line 114, in worker
    task = get()
  File "/pyenv/data/versions/3.9.18/lib/python3.9/multiprocessing/queues.py", line 367, in get
    return _ForkingPickler.loads(res)
  File "/src/src/MCPClient/lib/client/mcp.py", line 23, in signal_handler
    pool.stop()
  File "/src/src/MCPClient/lib/client/pool.py", line 103, in stop
    self.pool_maintainance_thread.join()
  File "/pyenv/data/versions/3.9.18/lib/python3.9/threading.py", line 1057, in join
    raise RuntimeError("cannot join current thread")
RuntimeError: cannot join current thread

The problem seems to come from the bag.is_valid helper call and to be caused by bagit's validation use of the processes parameter:

        ...
        bag.validate(
            processes=multiprocessing.cpu_count(), completeness_only=completeness_only
        )
        ...

When the parameter is set and higher than 1 bagit creates a multiprocessing.Pool to calculate fixity hashes like this:

                ...
                try:
                    pool = multiprocessing.Pool(
                        processes if processes else None, initializer=worker_init
                    )
                    hash_results = pool.map(_calc_hashes, args)
                finally:
                    pool.terminate()
                ...

This pool doesn't wait for its process to finish (there's no join() call) which causes intermittent SIGTERM signals to be received by the MCPClient pool forcing it to stop.

Interestingly, bagit uses another multiprocessing.Pool for creating bag manifests from the make_bag helper which is consumed by Archivematica in a couple of client scripts and doesn't seem to cause any problems. That one does implement the close() and join() approach explained in the module documentation:

    if processes > 1:
        pool = multiprocessing.Pool(processes=processes)
        checksums = pool.map(manifest_line_generator, _walk(data_dir))
        pool.close()
        pool.join()

@replaceafill
Copy link
Member

@mamedin tested this extensively with transfers with many files in the am116bigserver and tracked them with Grafana. Memory consumption has decreased significantly compared to 1.15 once the MCPClient workers are restarted when they reach their maximum job capacity.

@fitnycdigitalinitiatives

Hello,

I've run into a peculiar that I'm wondering if might be related to the updates here. Since we've upgraded to 1.16 and also from RHEL 7 to 9, I've been noticing this occasional problem where our S3 uploads indefinitely hang without throwing an error and stalls out the system, which requires restarting AM services. I haven't been able to pinpoint what's causing the problem but in testing out different s3 configurations, I've noticed so far that setting use_threads to false seems to prevent it from occurring. Is it remotely possible that what's going on with threads and pools in these updates could somehow disrupt the threading of a boto3 upload? I understand these changes related to the MCPClient and the S3 upload occurs with the storage service, so maybe it's completely unrelated but thought I ask some folks that understand this alot better than me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Severity: medium An inconvenient situation where the software is usable but inconvenient or slow. Status: review The issue's code has been merged and is ready for testing/review. Type: bug A flaw in the code that causes the software to produce an incorrect or unexpected result.
Projects
None yet
Development

No branches or pull requests

5 participants