You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
From the paper, it seems that the experiments show it supports both adding and removing nodes during training.
I successfully ran Oobleck with node failures (removing nodes), but I couldn't find a way to add nodes dynamically during training. Could you let me know how to make it work?
Thank you!
Lam
The text was updated successfully, but these errors were encountered:
All experiments in the paper were done with a Bamboo simulator, by measuring throughput and overheads of reconfiguration in every configuration and combining them. Current code does not include implementation for adding nodes. This is a future work; I think simply running reconfiguration would be enough, but need to try.
Hi @insujang ,
Thanks for open-sourcing Oobleck, great work!
From the paper, it seems that the experiments show it supports both adding and removing nodes during training.
I successfully ran Oobleck with node failures (removing nodes), but I couldn't find a way to add nodes dynamically during training. Could you let me know how to make it work?
Thank you!
Lam
The text was updated successfully, but these errors were encountered: