You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Jun 6, 2024. It is now read-only.
What would you like to be added:
Support for different hardware configurations for different task roles of one distributed job.
Why is this needed:
For complex learning tasks, the programs that need to run on each computer are very different, and the requirements for CPU /GPU and RAM /GPU memory are also different. At the same time, these computers need to communicate with each other to enable joint training. For example, in reinforcement learning, the entire reinforcement learning algorithm consists of different modules. The actor uses the GPU to generate data, the learner uses the GPU to train data, the environment and MCTS use CPU to generate data in parallel, and these modules involve complex data communication.
Without this feature, how does the current module work:
Reinforcement learning tasks cannot be performed jointly by multiple computers.
Components that may involve changes:
Job protocol and related.
Downgrade vc to taskrole:
Allows each taskrole to have a different skutype:
The text was updated successfully, but these errors were encountered:
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
What would you like to be added:
Support for different hardware configurations for different task roles of one distributed job.
Why is this needed:
For complex learning tasks, the programs that need to run on each computer are very different, and the requirements for CPU /GPU and RAM /GPU memory are also different. At the same time, these computers need to communicate with each other to enable joint training. For example, in reinforcement learning, the entire reinforcement learning algorithm consists of different modules. The actor uses the GPU to generate data, the learner uses the GPU to train data, the environment and MCTS use CPU to generate data in parallel, and these modules involve complex data communication.
Without this feature, how does the current module work:
Reinforcement learning tasks cannot be performed jointly by multiple computers.
Components that may involve changes:
Job protocol and related.
Downgrade vc to taskrole:
Allows each taskrole to have a different skutype:
The text was updated successfully, but these errors were encountered: