-
Notifications
You must be signed in to change notification settings - Fork 393
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce memory footprint of parsed datacard #791
base: main
Are you sure you want to change the base?
Conversation
Previously we were storing nbins*nproc*nsyst of mostly zeros Python floats also take 24 bytes, due to object boxing
Other significant memory usage in t2w comes from extracting the shapes from input files. In particular: HiggsAnalysis-CombinedLimit/python/ShapeTools.py Lines 723 to 725 in 385ac38
caches all input workspaces (for good reason, they may be expensive to continuously re-open) from which various objects may be read, and HiggsAnalysis-CombinedLimit/python/ShapeTools.py Lines 839 to 847 in 385ac38
caches all histograms extracted from the input. This latter case I think is optional, because these histograms are likely read once and the data is anyway copied into the RooFit/Combine object that goes into the output workspace. |
Ok the initial implementation is apocalyptically slow. Trying a new one |
Hi Nick, Did you manage to speed up the implementation? Or is this still a work in progress? |
I haven't had a chance to revisit, but where I left off I found it very challenging to maintain the current performance while migrating away from dictionaries to even just defaultdict or something else that doesn't store millions of |
Yes, maybe using |
Previously we were storing
nbins*nproc*nsyst
of mostly zeros for the nuisance parameter effect info (errline
)Python floats also take 24 bytes, due to object boxing
With this change, the parsed datacard for STXSStage1p2full now takes 90MB vs. previous 2.7GB to store the
errline
s