Explore multi-threading `{asset}_abcd_scenario.rds` generating function #2

jdhoffa · 2022-09-06T16:13:28Z

With RMI-PACTA/archive.pacta.data.preparation#81 closed, we open an opportunity to use multi-threading to speed up the time- and memory- intensive processes.

In particular, we might be able to spread each process to calculate {asset}_abcd_scenario_{scenario_name}.rds across multiple CPUs.

The text was updated successfully, but these errors were encountered:

cjyetman · 2022-09-06T16:18:38Z

be aware, this may increase the memory pressure if multiple threads are using up tons of memory

AlexAxthelm · 2022-11-14T16:19:12Z

All of this is something that we'll have time to explore (read: offer multiple solutions) Soon™️

cjyetman · 2022-11-14T16:54:22Z

mo' memory, mo' memory

AlexAxthelm · 2022-11-14T17:46:19Z

cough externalize our data cough

cjyetman · 2024-04-29T09:54:06Z

Considering how much memory is needed/would be needed for each of these hypothetical threads, I think this would not be an advantageous thing to do. Can I close this @jdhoffa?

jdhoffa · 2024-04-29T09:59:02Z

Sounds good!

AlexAxthelm · 2024-04-29T16:28:02Z

Given that this thread is recently dead, I'd like to revive it just a bit (as a long-term improvement). I think that rather than focusing on multithreading the application code, this and dataprep_connect_abcd_with_scenario (see also #7) would be amenable to parallelizing across multiple runners rather than multiple threads on the same machine.

If we split the scenarios out in such a manner that they could be handled my multiple machines (each with access to their own resources, eg. RAM) we could keep them single threaded (like R likes), but lift the memory constraint. Probably some block-level rearranging involved, but I think that's less scary than trying to introduce any of the openmp- or futures-based paradigms into our stack.

jdhoffa · 2024-04-29T16:30:46Z

I think perhaps that should live in it's own issue?

cjyetman · 2024-04-29T16:48:21Z

This seems a bit like building a giant umbrella to cover a boat that has a hole in it. We know what the real problem is (dataprep_connect_abcd_with_scenario()). I think our efforts would be better focussed on that rather than distributing the work it does across multiple threads/computers/whatever.

jdhoffa transferred this issue from another repository Apr 15, 2024

cjyetman closed this as completed Apr 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Explore multi-threading `{asset}_abcd_scenario.rds` generating function #2

Explore multi-threading `{asset}_abcd_scenario.rds` generating function #2

jdhoffa commented Sep 6, 2022

cjyetman commented Sep 6, 2022

AlexAxthelm commented Nov 14, 2022

cjyetman commented Nov 14, 2022

AlexAxthelm commented Nov 14, 2022

cjyetman commented Apr 29, 2024

jdhoffa commented Apr 29, 2024

AlexAxthelm commented Apr 29, 2024

jdhoffa commented Apr 29, 2024

cjyetman commented Apr 29, 2024

Explore multi-threading {asset}_abcd_scenario.rds generating function #2

Explore multi-threading {asset}_abcd_scenario.rds generating function #2

Comments

jdhoffa commented Sep 6, 2022

cjyetman commented Sep 6, 2022

AlexAxthelm commented Nov 14, 2022

cjyetman commented Nov 14, 2022

AlexAxthelm commented Nov 14, 2022

cjyetman commented Apr 29, 2024

jdhoffa commented Apr 29, 2024

AlexAxthelm commented Apr 29, 2024

jdhoffa commented Apr 29, 2024

cjyetman commented Apr 29, 2024

Explore multi-threading `{asset}_abcd_scenario.rds` generating function #2

Explore multi-threading `{asset}_abcd_scenario.rds` generating function #2