reduce microservices memory footprint #12200

mapellidario · 2024-12-10T11:49:14Z

Impact of the new feature

MicroServices

Is your feature request related to a problem? Please describe.

We realized that the microservices memory footprint depends on their backlog, for example for ms-rulecleaner at every polling cycle runs the function _execute() only once [1] on every workflow with a certain status [2]

Describe the solution you'd like

Taking ms-rulecleaner as an example, we could change getRequestRecords into a generator that yields only a few workflows every time it is called. We would need to add a for loop in execute() around the call to _execute(). Not a huge effort, achievable without consistent refactoring.

Describe alternatives you've considered

The alternative would be to process once workflow at a time, possibly moving our model to a pub/sub, but this would require some major refactoring

Additional context

Follow-up to #12042 .

[1]

WMCore/src/python/WMCore/MicroService/MSRuleCleaner/MSRuleCleaner.py

Line 222 in beefc74

    
           totalNumRequests, cleanNumRequests, normalArchivedNumRequests, forceArchivedNumRequests = self._execute(requestRecords)

[2]

WMCore/src/python/WMCore/MicroService/MSRuleCleaner/MSRuleCleaner.py

Line 775 in beefc74

result = self.reqmgr2.getRequestByStatus([reqStatus], detail=True)

The text was updated successfully, but these errors were encountered:

vkuznet · 2024-12-10T13:32:35Z

@mapellidario , yesterday I posted on MM chat to Alan and Andrea my observations which aligned with the ticket. Here is my posting (for completeness on the issue):

Here is a proof of memory spike in MSRuleCleanerWflow call which appears on line https://github.com/dmwm/WMCore/blob/master/src/python/WMCore/MicroService/MSRuleCleaner/MSRuleCleaner.py#L263

I took test/python/WMCore_t/MicroService_t/MSRuleCleaner_t/MSRuleCleanerWflow_t.py code and added memory profile to one of the unit test as following:

import tracemalloc
    def setUp(self):
        ...
        tracemalloc.start()
    def tearDown(self):
        # Stop tracing and print memory usage details
        current, peak = tracemalloc.get_traced_memory()
        print(f"Current memory usage: {current / 1024:.2f} KB")
        print(f"Peak memory usage: {peak / 1024:.2f} KB")
....
    def testIncludeParents(self):
       ....
        tracemalloc.stop()
        for idx in range(10000):
            req = self.includeParentsReq
            for key, val in req.items():
                if isinstance(val, (str, bytes)):
                    req[key] += "%s" % idx
            MSRuleCleanerWflow(req)

Basically, I run over 10K requests which I modified slightly and call MSRuleCleanerWflow for each of them in a similar manner as MSRuleCleaner code is doing.

Here is the outcome:

without my loop I observer on average 10KB memory footprint

python test/python/WMCore_t/MicroService_t/MSRuleCleaner_t/MSRuleCleanerWflow_t.py
Current memory usage: 8.56 KB
Peak memory usage: 11.52 KB
.Current memory usage: 7.54 KB
Peak memory usage: 10.52 KB
.Current memory usage: 7.87 KB
Peak memory usage: 10.86 KB
.Current memory usage: 5.82 KB
Peak memory usage: 7.70 KB
.

and, when I enable my for loop I see the following:

Current memory usage: 1232.53 KB
Peak memory usage: 1276.86 KB
.Current memory usage: 7.54 KB
Peak memory usage: 10.30 KB
.Current memory usage: 7.87 KB
Peak memory usage: 10.64 KB
.Current memory usage: 5.82 KB
Peak memory usage: 7.49 KB
.

As you can see the first reported set of numbers which correspond to the test I modified spiked from 11KB to 1232KB.

Therefore if we take MSRuleCleaner for loop at line https://github.com/dmwm/WMCore/blob/master/src/python/WMCore/MicroService/MSRuleCleaner/MSRuleCleaner.py#L262 and pass 10K requests you will see a spike of 1000x times in memory due to the memory allocation in MSRuleCleanerWflow call (which by itself makes couple of deepcopy calls over nested python dictionary)

Here is modified version I used MSRuleCleanerWflow_t.py

To fix the problem few steps should be performed:

the _execute should process single workflow or request, instead of taking list of requests and loading corresponding number of workflow objects.
The for loop for reqRecords should be taken out of this method to upper codebase and process only one workflow at a time which will keep memory be equal to one workflow
wfCounters should be taken outside of this code as well and converted to basic integers, instead of keeping them in a nested dict
the execute code should be refactored into something like this:

def execute(self, reqStatus):
      ...
      for status in reqStatus:
          # in this loop we'll only allocate single wflow object, process it and collect metrics
          # therefore, the memory allocation will be flat regardless of number of records.
          for rec in self.getRequestRecords(status):
               metrics = self._execute(rec)  # metrics is a tuple of integers
               total_num += metrics[0]  # first metric counte
               ...
               self.updateReportDict(summary, "total_num_requests", total_num)
      ...
      
def _execute(self, record):
      ...
      wflow = MSRuleCleanerWflow(req)
      ...
      # process pipelines and obtain necessary metrics
      metrics = (totalNum, cleanNum, normalArchivedNum, forceArchivedNumR)
      return metrics

mapellidario added New Feature Medium Priority ReqMgr2MS labels Dec 10, 2024

mapellidario changed the title ~~reduce microservice memory footprint~~ reduce microservices memory footprint Dec 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

reduce microservices memory footprint #12200

reduce microservices memory footprint #12200

mapellidario commented Dec 10, 2024

vkuznet commented Dec 10, 2024 •

edited

Loading

reduce microservices memory footprint #12200

reduce microservices memory footprint #12200

Comments

mapellidario commented Dec 10, 2024

vkuznet commented Dec 10, 2024 • edited Loading

vkuznet commented Dec 10, 2024 •

edited

Loading