Fix parallel parsing of many files in get_results_as_dataframe #76

non-det-alle · 2024-09-18T17:01:11Z

Hello,

When working with 1000+ simulation result files, the parallel_parsing option of the CampaignManager.get_results_as_dataframe() utility causes the main process to get stuck trying to open and load all the files at once, with no display of progress. If the files are big, this could eventually lead to memory issues and crashes.

This happens because the program is trying to build at once the full list of inputs to be fed to pool.imap_unordered. As a simple fix, I suggest to change the list into a generator. Generators are lazy iterators, so each file is only opened when the next element is requested by pool.imap_unordered, and closed when the task is done. As an added benefit, this also fixes the tqdm progress bar, which otherwise appears only when the list is completely loaded.

Best,
Alessandro

pagmatt · 2024-09-19T10:36:16Z

Great improvement, thanks @non-det-alle

get_results: feed generator to imap

34bd745

Thecave3 requested a review from pagmatt September 18, 2024 19:28

pagmatt merged commit 6a39f0a into signetlabdei:master Sep 19, 2024
4 of 5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix parallel parsing of many files in get_results_as_dataframe #76

Fix parallel parsing of many files in get_results_as_dataframe #76

non-det-alle commented Sep 18, 2024

pagmatt commented Sep 19, 2024

Fix parallel parsing of many files in get_results_as_dataframe #76

Fix parallel parsing of many files in get_results_as_dataframe #76

Conversation

non-det-alle commented Sep 18, 2024

pagmatt commented Sep 19, 2024