Issues with the TorchDistributedStrategy #216

jarlsondre · 2024-09-19T13:16:37Z

Issues/Requests regarding the TorchDisributedStrategy

This is a list of issues/problems or just plain requests regarding the TorchDistributedStrategy class. Feel free to comment if you have anything to add :)

The init() method: The strategy currently has a method called init() which is distinct from the __init__() method of the object. I suppose the motivation for this is more granular control of the strategy. Still, it causes the code to be considerably longer as it requires a check for the initialization of the strategy in every function. The only case I can think of is the situation where you want to create the strategy object without initializing it, but I am not quite sure when that would ever need to happen. Please comment this post if you can think of a case, though.
With this in mind, the proposed solution is to simply remove the init() method and perform the necessary functionality inside of the __init__() method of the object. This would improve usability and increase readability significantly.
Property Methods: The TorchDistributedStrategy class has many methods such as global_rank() and local_rank() that seem like properties, but still are functions you have to call. This is fine on its own, but the rest of the itwinai library uses the convention of making these methods into properties with the @property decorator. Thus, I believe that the same should be done for the TorchDistributedStrategy class.
Name Field/__str__ Implementation: There is currently no way of knowing the name of a strategy automatically, meaning that you are required to do something like
```
if isinstance(strategy, TorchDPPStrategy): 
    strategy_name = "ddp"
elif isinstance(strategy, ...): 
   ...
```
to extract the name of a strategy. This is quite annoying and could be hard to maintain with the addition of new strategies. A suggested solution is to add a name field that you can access or simply an implementation of __str__ so that you can call str(strategy). Personally, I think I prefer name as the meaning is more explicit, but both could be nice.
Barrier Method: (Barrier? I barely know'er!) Currently there is no implementation for the barrier() method in the TorchDistributedStrategy, but this is a rather useful method and is implemented differently for Horovod than the other strategies. Therefore, dist.barrier() will only work if you use DDP or DeepSpeed, but not Horovod, making this a useful addition.

The text was updated successfully, but these errors were encountered:

jarlsondre added the enhancement New feature or request label Sep 19, 2024

jarlsondre assigned matbun and jarlsondre Sep 19, 2024

jarlsondre changed the title ~~Init() method of the TorchDistributedStrategy~~ Issues with the TorchDistributedStrategy Oct 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issues with the TorchDistributedStrategy #216

Issues with the TorchDistributedStrategy #216

jarlsondre commented Sep 19, 2024 •

edited

Loading

Issues with the TorchDistributedStrategy #216

Issues with the TorchDistributedStrategy #216

Comments

jarlsondre commented Sep 19, 2024 • edited Loading

Issues/Requests regarding the TorchDisributedStrategy

jarlsondre commented Sep 19, 2024 •

edited

Loading