Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explore multi-threading {asset}_abcd_scenario.rds generating function #2

Closed
jdhoffa opened this issue Sep 6, 2022 · 9 comments
Closed

Comments

@jdhoffa
Copy link
Member

jdhoffa commented Sep 6, 2022

With RMI-PACTA/archive.pacta.data.preparation#81 closed, we open an opportunity to use multi-threading to speed up the time- and memory- intensive processes.

In particular, we might be able to spread each process to calculate {asset}_abcd_scenario_{scenario_name}.rds across multiple CPUs.

@cjyetman
Copy link
Member

cjyetman commented Sep 6, 2022

be aware, this may increase the memory pressure if multiple threads are using up tons of memory

@AlexAxthelm
Copy link

All of this is something that we'll have time to explore (read: offer multiple solutions) Soon™️

@cjyetman
Copy link
Member

mo' memory, mo' memory

@AlexAxthelm
Copy link

cough externalize our data cough

@jdhoffa jdhoffa transferred this issue from another repository Apr 15, 2024
@cjyetman
Copy link
Member

Considering how much memory is needed/would be needed for each of these hypothetical threads, I think this would not be an advantageous thing to do. Can I close this @jdhoffa?

@jdhoffa
Copy link
Member Author

jdhoffa commented Apr 29, 2024

Sounds good!

@AlexAxthelm
Copy link

Given that this thread is recently dead, I'd like to revive it just a bit (as a long-term improvement). I think that rather than focusing on multithreading the application code, this and dataprep_connect_abcd_with_scenario (see also #7) would be amenable to parallelizing across multiple runners rather than multiple threads on the same machine.

If we split the scenarios out in such a manner that they could be handled my multiple machines (each with access to their own resources, eg. RAM) we could keep them single threaded (like R likes), but lift the memory constraint. Probably some block-level rearranging involved, but I think that's less scary than trying to introduce any of the openmp- or futures-based paradigms into our stack.

@jdhoffa
Copy link
Member Author

jdhoffa commented Apr 29, 2024

I think perhaps that should live in it's own issue?

@cjyetman
Copy link
Member

This seems a bit like building a giant umbrella to cover a boat that has a hole in it. We know what the real problem is (dataprep_connect_abcd_with_scenario()). I think our efforts would be better focussed on that rather than distributing the work it does across multiple threads/computers/whatever.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants