Replies: 1 comment 13 replies
-
Some additional context - this isn't new code. It used to run fine, setup this way, roughly a month/month and a half ago. I did have intermittent hangups, but I managed them be refiring the pipeline. After updating all my packages recently, I have this issue. |
Beta Was this translation helpful? Give feedback.
13 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Help
Description
Hi Will,
Hope you're doing well. I've been spinning my wheels on something for the past many days, and I feel like I'm stuck in a rut. I don't have a reprex for you (
run_bespoke_model()
in the example below is a proprietary function), so I'm doing my best to explain my situation.The objective of the code below is to allow me to create any number of model specifications in
models_list
(think, hyper parameters, for example). This gets split into a dynamic targetone_model_spec
which feeds into a dynamic targetlist_model_results
. The model results are generated using an internally built package containingrun_bespoke_model()
.run_bespoke_model()
can take a long time to run, which is why I've set things up in this fashion... everything remains cached, I can quickly add new options tomodels_list
and execute the pipeline, without having to run all of the models each time.The result of
run_bespoke_model()
for each model-spec is quite a large a nested list-of-list objects.Note: After much trial and error, I realized that:
format="qs"
simply didn't work for the largelist_model_results
and targets would keep spinning it's wheels.storage = "worker", retrieval = "worker"
are needed, else the run-times compound manyfoldmemory = "transient"
garbage_collection = TRUE
, but probably doesn't hurt?The issue
list_model_results
target will solve fine... then the remaining will just 'run forever'. I've waited 20-40 minutes, without any results, when I know that that particular model-spec shouldn't take more than 1-2 minutes.I'm not sure why this is happening, and I'm lost as to how to fix things.
Example
See the execution time increase significantly from seconds... to 20 minutes... to now fully "hung" execution (over 1 hour).
Session Info
Running in a docker container, on a M1 Ultra machine (20 cores, 128GB RAM)
Beta Was this translation helpful? Give feedback.
All reactions