-
Notifications
You must be signed in to change notification settings - Fork 107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MSUnmerged T1 service crashing when pulling large storage json dump #12061
Comments
Relevant tickets so far: |
The main issue with big JSON is dictionary (nested) representation of the data. This leads any parser in any programming language to load it as one object into memory. Instead, the data should be represented as list of smaller objects. Here are two approaches to represent the same data:
Here the memory footprint in parser can be equal to the size of one single record. Moreover, such data-representation allows data-streaming, e.g. usage of NDJSON data-format:
Observation I: quite often WM objects are represented (dumped from CouchDB) as nested dictionaries. There is no JSON schema, i.e. JSON is not defined with set of pre-defined keys, e.g. Observation II: we often use CouchDB to store python dictionaries, and then look them up. Dictionaries have no serialization, i.e. it should be loaded at once into the memory. Therefore, quite often the main problem is lack of data-representation schema. The CouchDB (or MongoDB) allows to store unstructured data but it doesn't mean that we should not cary about data-representation schema. We should put a layer (data-service) in front of any database which should consume and yield data in specific data format (JSON or NDJSON) and translate (if necessary) incoming/outgoing data in/to database. Instead, we often use database directly for data-storage and this leads to discussed problem when data size of stored objects increases and data is stored as dictionaries. |
Valentin, I think we all are aware of these limitations and legacy/habit of storing nested dictionaries with non-static keys (which does not have only disadvantages btw). Nonetheless, I see you have not even looked at the structure that is dealt with in MSUnmerged, which is actually responsible for this blow up of memory footprint. Just in case, we are talking about a flat list, e.g.:
In addition, we do not control this data, as it comes from one of the Rucio services, as mentioned in the initial description. |
If data is represented as flat list of strings, the amount of memory required to parse it is equal to the longest string in a list. If you do not control it, you can still write a parser to read it as one line at a time (instead of using JSON load(s)). This will reduce parser memory requirement to a single string size. That said, I doubt that long list requires 8GB ram size unless you deal with billions of such strings. Here is a basic proof:
So, in other words, single filename costs 224 bytes, the array of such strings has size of 85KB for 10K entries, and array of 1M entries will cost 8MB. But it is still 3 orders of magnitude lower then 8GB. But if you will use such list within nested dictionary, then your memory footprint may blow up significantly based on a structure and nest-ness level of the dictionary. |
I just managed to resume working on this item and I tried to delete the following 2 directories:
from FNAL_Disk with the gfal2 plugin command Initially, this is the error I was getting inside the
It turns out the CAs are pretty old/stale in this POD. After copying them from lxplus inside the ms-unmer-t1 pod, the same command succeeded without any issues. I will follow this up with Aroosha and see if we can start mounting /etc/grid-security/certificates into the backend pods (or at least the WM ones). |
For my own record, the current snapshot from Rucio Consistency Enforcement from
I decided to bump the resource requirements for the production
The first ~100 deletions failed because I still did not have up-to-date CAs in the node, but after fixing those I see that directory deletion is successfully going through, example:
|
Impact of the bug
MSUnmerged
Describe the bug
After bumping the resources limit to up to 8GB of memory RAM, ms-unmerged-t1 is still unable to load data into memory for T1_US_FNAL_Disk, which makes the service to crash every minute and repeat this in an endless mode.
In addition, several T1 storage forbid the WMCore robot certificate to delete unneeded data from the unmerged area. Identified RSEs so far are: CNAF and JINR.
How to reproduce it
Fetch the FNAL unmerged dump and load it into memory in python.
Expected behavior
Ideally, we should have a better control over the data that is loaded into memory, either streaming data or loading data in chunks.
We also have to follow up on those permission issues with the site admins.
Additional context and error message
We seem to be crashing this service at least since Mar 21st, 2024(!!!)
The text was updated successfully, but these errors were encountered: