Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues with the TorchDistributedStrategy #216

Open
jarlsondre opened this issue Sep 19, 2024 · 0 comments
Open

Issues with the TorchDistributedStrategy #216

jarlsondre opened this issue Sep 19, 2024 · 0 comments
Assignees
Labels
enhancement New feature or request

Comments

@jarlsondre
Copy link
Collaborator

jarlsondre commented Sep 19, 2024

Issues/Requests regarding the TorchDisributedStrategy

This is a list of issues/problems or just plain requests regarding the TorchDistributedStrategy class. Feel free to comment if you have anything to add :)

  1. The init() method: The strategy currently has a method called init() which is distinct from the __init__() method of the object. I suppose the motivation for this is more granular control of the strategy. Still, it causes the code to be considerably longer as it requires a check for the initialization of the strategy in every function. The only case I can think of is the situation where you want to create the strategy object without initializing it, but I am not quite sure when that would ever need to happen. Please comment this post if you can think of a case, though.
    With this in mind, the proposed solution is to simply remove the init() method and perform the necessary functionality inside of the __init__() method of the object. This would improve usability and increase readability significantly.

  2. Property Methods: The TorchDistributedStrategy class has many methods such as global_rank() and local_rank() that seem like properties, but still are functions you have to call. This is fine on its own, but the rest of the itwinai library uses the convention of making these methods into properties with the @property decorator. Thus, I believe that the same should be done for the TorchDistributedStrategy class.

  3. Name Field/__str__ Implementation: There is currently no way of knowing the name of a strategy automatically, meaning that you are required to do something like

    if isinstance(strategy, TorchDPPStrategy): 
        strategy_name = "ddp"
    elif isinstance(strategy, ...): 
       ...

    to extract the name of a strategy. This is quite annoying and could be hard to maintain with the addition of new strategies. A suggested solution is to add a name field that you can access or simply an implementation of __str__ so that you can call str(strategy). Personally, I think I prefer name as the meaning is more explicit, but both could be nice.

  4. Barrier Method: (Barrier? I barely know'er!) Currently there is no implementation for the barrier() method in the TorchDistributedStrategy, but this is a rather useful method and is implemented differently for Horovod than the other strategies. Therefore, dist.barrier() will only work if you use DDP or DeepSpeed, but not Horovod, making this a useful addition.

@jarlsondre jarlsondre added the enhancement New feature or request label Sep 19, 2024
@jarlsondre jarlsondre changed the title Init() method of the TorchDistributedStrategy Issues with the TorchDistributedStrategy Oct 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants