Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MSUnmerged T1 service crashing when pulling large storage json dump #12061

Open
amaltaro opened this issue Jul 30, 2024 · 6 comments · May be fixed by #12059
Open

MSUnmerged T1 service crashing when pulling large storage json dump #12061

amaltaro opened this issue Jul 30, 2024 · 6 comments · May be fixed by #12059
Assignees

Comments

@amaltaro
Copy link
Contributor

Impact of the bug
MSUnmerged

Describe the bug
After bumping the resources limit to up to 8GB of memory RAM, ms-unmerged-t1 is still unable to load data into memory for T1_US_FNAL_Disk, which makes the service to crash every minute and repeat this in an endless mode.

In addition, several T1 storage forbid the WMCore robot certificate to delete unneeded data from the unmerged area. Identified RSEs so far are: CNAF and JINR.

How to reproduce it
Fetch the FNAL unmerged dump and load it into memory in python.

Expected behavior
Ideally, we should have a better control over the data that is loaded into memory, either streaming data or loading data in chunks.
We also have to follow up on those permission issues with the site admins.

Additional context and error message
We seem to be crashing this service at least since Mar 21st, 2024(!!!)

[21/Mar/2024:00:00:31]  WATCHDOG: server exited with exit code signal 9... restarting
@amaltaro
Copy link
Contributor Author

@vkuznet
Copy link
Contributor

vkuznet commented Jul 30, 2024

The main issue with big JSON is dictionary (nested) representation of the data. This leads any parser in any programming language to load it as one object into memory. Instead, the data should be represented as list of smaller objects. Here are two approaches to represent the same data:

  1. Big JSON example
{
   "value1": {"abc": {...}},
   "value2": {"xyz": {...}}
}
  1. The same data which allows low memory footprint (equal to the size of single record:
[
  {"key": "value1", ...},
  {"key": "value2", ...}
]

Here the memory footprint in parser can be equal to the size of one single record. Moreover, such data-representation allows data-streaming, e.g. usage of NDJSON data-format:

{"key": "value", ...}
{"key": "value", ...}

Observation I: quite often WM objects are represented (dumped from CouchDB) as nested dictionaries. There is no JSON schema, i.e. JSON is not defined with set of pre-defined keys, e.g. {"block#123": ...}, or {"T1_XXX": ...}. Instead, we should strive to define JSON schema with concrete keys, e.g. block, site, and then use it to define data-structure, e.g. {"block": "block1#xyz", "site": "T1_XXX"}.

Observation II: we often use CouchDB to store python dictionaries, and then look them up. Dictionaries have no serialization, i.e. it should be loaded at once into the memory. Therefore, quite often the main problem is lack of data-representation schema. The CouchDB (or MongoDB) allows to store unstructured data but it doesn't mean that we should not cary about data-representation schema. We should put a layer (data-service) in front of any database which should consume and yield data in specific data format (JSON or NDJSON) and translate (if necessary) incoming/outgoing data in/to database. Instead, we often use database directly for data-storage and this leads to discussed problem when data size of stored objects increases and data is stored as dictionaries.

@amaltaro
Copy link
Contributor Author

Valentin, I think we all are aware of these limitations and legacy/habit of storing nested dictionaries with non-static keys (which does not have only disadvantages btw).

Nonetheless, I see you have not even looked at the structure that is dealt with in MSUnmerged, which is actually responsible for this blow up of memory footprint. Just in case, we are talking about a flat list, e.g.:

[
 "/store/unmerged/GenericNoSmearGENBackfillBackfillBackfill/InclusiveDileptonMinBias_TuneCP5Plus_13p6TeV_pythia8/GEN/BACKFILL-v8/100145/7f41787f-105e-418c-aa93-9b99c8974dc1.root",
 "/store/unmerged/GenericNoSmearGENBackfillBackfillBackfill/InclusiveDileptonMinBias_TuneCP5Plus_13p6TeV_pythia8/GEN/BACKFILL-v8/100205/e6a94b98-9d31-4093-bc21-7162792d9933.root",

In addition, we do not control this data, as it comes from one of the Rucio services, as mentioned in the initial description.

@vkuznet
Copy link
Contributor

vkuznet commented Jul 30, 2024

If data is represented as flat list of strings, the amount of memory required to parse it is equal to the longest string in a list. If you do not control it, you can still write a parser to read it as one line at a time (instead of using JSON load(s)). This will reduce parser memory requirement to a single string size. That said, I doubt that long list requires 8GB ram size unless you deal with billions of such strings. Here is a basic proof:

python
>>> s1="/store/unmerged/GenericNoSmearGENBackfillBackfillBackfill/InclusiveDileptonMinBias_TuneCP5Plus_13p6TeV_pythia8/GEN/BACKFILL-v8/100145/7f41787f-105e-418c-aa93-9b99c8974dc1.root"
>>> import sys
>>> sys.getsizeof(s1)
224
>>> arr=[s1 for _ in range(10000)]
>>> sys.getsizeof(arr)
85176
>>> arr=[s1 for _ in range(1000000)]
>>> sys.getsizeof(arr)
8448728

So, in other words, single filename costs 224 bytes, the array of such strings has size of 85KB for 10K entries, and array of 1M entries will cost 8MB. But it is still 3 orders of magnitude lower then 8GB.

But if you will use such list within nested dictionary, then your memory footprint may blow up significantly based on a structure and nest-ness level of the dictionary.

@amaltaro
Copy link
Contributor Author

amaltaro commented Oct 1, 2024

I just managed to resume working on this item and I tried to delete the following 2 directories:

    rseDirs = ["davs://***/dcache/uscmsdisk/store/unmerged/RunIISummer20UL17wmLHEGEN/TTHTo2B_TTToHadronic_M-125_TuneCP5_13TeV-powheg-pythia8/LHE/106X_mc2017_realistic_v6-v1",
               "davs://***/dcache/uscmsdisk/store/unmerged/Run3Summer22MiniAODv4/GluGluToContinto2Zto2E2Tau_TuneCP5_13p6TeV_mcfm701-pythia8/MINIAODSIM/130X_mcRun3_2022_realistic_v5-v2"]

from FNAL_Disk with the gfal2 plugin command ctx.rmdir(dirPfn).

Initially, this is the error I was getting inside the ms-unmer-t1 pod:

Error deleting: davs://***/dcache/uscmsdisk/store/unmerged/Run3Summer22MiniAODv4/GluGluToContinto2Zto2E2Tau_TuneCP5_13p6TeV_mcfm701-pythia8/MINIAODSIM/130X_mcRun3_2022_realistic_v5-v2, gfal code=112, gfal message: Result (Neon): Server certificate verification failed: issuer is not trusted after 1 attempts

It turns out the CAs are pretty old/stale in this POD. After copying them from lxplus inside the ms-unmer-t1 pod, the same command succeeded without any issues.

I will follow this up with Aroosha and see if we can start mounting /etc/grid-security/certificates into the backend pods (or at least the WM ones).

@amaltaro
Copy link
Contributor Author

amaltaro commented Oct 1, 2024

For my own record, the current snapshot from Rucio Consistency Enforcement from
https://cmsweb-prod.cern.ch/rucioconmon/unmerged/index says:

| RSE           | History | Last run | Elapsed time | Status | Files | Size |
| T1_US_FNAL_Disk |   | 2024/09/30 | 20m1s | done | 11670914 | 1.6P |

I decided to bump the resource requirements for the production ms-unmer-t1 from 2 to 7GB limit (according to Grafana, it peaked at 6.3GB then it stabilizes in 700MB), plus:

The first ~100 deletions failed because I still did not have up-to-date CAs in the node, but after fixing those I see that directory deletion is successfully going through, example:

2024-10-01 01:48:49,970:INFO:MSUnmerged: Processing directory index 287 out of 13105
2024-10-01 01:48:49,970:INFO:MSUnmerged: Trying to remove the whole directory: davs://***/dcache/uscmsdisk/store/unmerged/Run3Summer22EEMiniAODv4/Zto2Nu-2Jets_PTNuNu-40to100_1J_TuneCP5_13p6TeV_amcatnloFXFX-pythia8/MINIAODSIM/130X_mcRun3_2022_realistic_postEE_v6-v1
2024-10-01 01:48:59,962:INFO:MSUnmerged: Directory successfully removed: davs://***/dcache/uscmsdisk/store/unmerged/Run3Summer22EEMiniAODv4/Zto2Nu-2Jets_PTNuNu-40to100_1J_TuneCP5_13p6TeV_amcatnloFXFX-pythia8/MINIAODSIM/130X_mcRun3_2022_realistic_postEE_v6-v1
2024-10-01 01:48:59,963:INFO:MSUnmerged: Processing directory index 288 out of 13105
2024-10-01 01:48:59,963:INFO:MSUnmerged: Trying to remove the whole directory: davs://***/dcache/uscmsdisk/store/unmerged/RunIISummer20UL16NanoAODAPVv9/VBF_LFV_HToETau_M125_TuneCH3_13TeV_powheg_herwig7/NANOAODSIM/106X_mcRun2_asymptotic_preVFP_v11-v3
2024-10-01 01:49:00,559:INFO:MSUnmerged: Directory successfully removed: davs://***/dcache/uscmsdisk/store/unmerged/RunIISummer20UL16NanoAODAPVv9/VBF_LFV_HToETau_M125_TuneCH3_13TeV_powheg_herwig7/NANOAODSIM/106X_mcRun2_asymptotic_preVFP_v11-v3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: In Progress
Development

Successfully merging a pull request may close this issue.

2 participants