Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix parallel parsing of many files in get_results_as_dataframe #76

Merged
merged 1 commit into from
Sep 19, 2024

Conversation

non-det-alle
Copy link
Contributor

Hello,

When working with 1000+ simulation result files, the parallel_parsing option of the CampaignManager.get_results_as_dataframe() utility causes the main process to get stuck trying to open and load all the files at once, with no display of progress. If the files are big, this could eventually lead to memory issues and crashes.

This happens because the program is trying to build at once the full list of inputs to be fed to pool.imap_unordered. As a simple fix, I suggest to change the list into a generator. Generators are lazy iterators, so each file is only opened when the next element is requested by pool.imap_unordered, and closed when the task is done. As an added benefit, this also fixes the tqdm progress bar, which otherwise appears only when the list is completely loaded.

Best,
Alessandro

@Thecave3 Thecave3 requested a review from pagmatt September 18, 2024 19:28
@pagmatt
Copy link
Member

pagmatt commented Sep 19, 2024

Great improvement, thanks @non-det-alle

@pagmatt pagmatt merged commit 6a39f0a into signetlabdei:master Sep 19, 2024
4 of 5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants