Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Threads and progress #355

Merged
merged 3 commits into from
Dec 1, 2023
Merged

Threads and progress #355

merged 3 commits into from
Dec 1, 2023

Conversation

bendavid
Copy link
Collaborator

@bendavid bendavid commented Dec 1, 2023

Updated singularity image eliminates most spurious threads from Tensorflow Lite by adding a dedicated patch (previously one spurious thread per RDF-thread per tflite helper, now reduced to one thread in total for tflite)

Updated env variables in narf eliminate an addition ncores threads from Eigen.

Updated singularity image contains a general update of package versions, but also ROOT 6.30.02 (plus our patches for improved PyROOT debugging which aren't upstream yet)

narf has been adapted to implement a variation on the Progress Bar feature (but in a way which is compatible with RunGraphs)

Zero physics changes expected (but maybe some numerical differences/reshuffling depending)

@cippy
Copy link
Collaborator

cippy commented Dec 1, 2023

The crash occurring in the histmakers might have been caused by a transient excessive usage of memory, David (and myself at some point) was running something almost saturating the cern machine memory

@bendavid
Copy link
Collaborator Author

bendavid commented Dec 1, 2023

No this is a technical problem with tensorflow. Already testing a fix.

@bendavid bendavid added enhancement New feature or request bugfix PR intended to fix a bug labels Dec 1, 2023
@bendavid bendavid force-pushed the threads_and_progress branch from 2befcdb to 7489e78 Compare December 1, 2023 15:06
@bendavid
Copy link
Collaborator Author

bendavid commented Dec 1, 2023

From in-progress CI logs:

begin lumi loop
[Total elapsed time: 0:22m  processed files: 100 / 100  processed evts: 992 / 992]   
end lumi loop
dataPostVFP 0.16036740900000002
begin event loop
|====>               |   |==================> |   [Elapsed time: 5:00m  processing file: 314 / 1163  processed evts: 68022000 / 68947393  2.27e+05 evt/s]   
|======>             |   |==================> |   [Elapsed time: 10:00m  processing file: 424 / 1163  processed evts: 112391000 / 113048301  1.87e+05 evt/s]   
|=======>            |   |==================> |   [Elapsed time: 15:00m  processing file: 497 / 1163  processed evts: 134375000 / 135285574  1.49e+05 evt/s]   
|=========>          |   |==================> |   [Elapsed time: 20:00m  processing file: 603 / 1163  processed evts: 158462000 / 159195216  1.32e+05 evt/s]   

@bendavid
Copy link
Collaborator Author

bendavid commented Dec 1, 2023

ci output is 1:1 identical

@bendavid bendavid merged commit bcccc5c into WMass:main Dec 1, 2023
25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugfix PR intended to fix a bug enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants