You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I think this is submitting in batches of 100; sometimes we have 30 runs going, sometimes we have 100. But there's 192 worker cores available. I'm not sure how we got so many workers??
This set seems to have recruited lots of unneeded compute
This set was appropriately scaled, but new jobs can get scheduled until every run of previous batch finishes
Not terrible for this example that runs fast and is easy; this won't work when you get a much more complicated model and the variability in run time is large, with some runs taking very long to finish
Thanks for capturing this @kylebaron do you think we should move this to an internal Metworx ticket, or do you think it's worth looking into whether the way bbr is submitting models is playing into this?
I think this part is relevant to the way that bbr is doing it; you can get some really skewed run times so I think this batching strategy will have problems sooner than later
This set was appropriately scaled, but new jobs can get scheduled until every run of previous batch finishes
Not terrible for this example that runs fast and is easy; this won't work when you get a much more complicated model and the variability in run time is large, with some runs taking very long to finish
I think this is submitting in batches of 100; sometimes we have 30 runs going, sometimes we have 100. But there's 192 worker cores available. I'm not sure how we got so many workers??
This set seems to have recruited lots of unneeded compute
30ish runs active
100 runs active
This set was appropriately scaled, but new jobs can get scheduled until every run of previous batch finishes
Not terrible for this example that runs fast and is easy; this won't work when you get a much more complicated model and the variability in run time is large, with some runs taking very long to finish
The run ended up with additional compute; I'm not sure why. This isn't an issue for
bbr
to solve, but wanted to document this was happening.The text was updated successfully, but these errors were encountered: