Is Sync BatchNorm supported? #2509
Replies: 8 comments
-
I am also curious. My guess is you need to convert_sync_bn manually, because sync bn is more inside building model part not trainer engine part. Do you have any progress? |
Beta Was this translation helpful? Give feedback.
-
@Yelen719 @ruotianluo we support sync_batchnorm in lightning now. |
Beta Was this translation helpful? Give feedback.
-
Hi, is there any tutorial how to use SyncBatchNorm in lightning ? |
Beta Was this translation helpful? Give feedback.
-
@phongnhhn92 With some search in the doc: https://pytorch-lightning.readthedocs.io/en/latest/trainer.html#sync-batchnorm |
Beta Was this translation helpful? Give feedback.
-
Hi @DKandrew @ananyahjha93 , Can u provide an example of how to use it. Of course, as easy as it sounds I can just add that option into Trainer. My question is that will that work out of the box for model using pytorch Synbatchnorm or above SyncBN from Apex ? |
Beta Was this translation helpful? Give feedback.
-
Hi @phongnhhn92 Here is an example: https://github.com/PyTorchLightning/pytorch-lightning/blob/114af8ba9fc42fcf7053fa06299fbe4aecab8a06/pl_examples/basic_examples/sync_bn.py By the way, I don't think the example given here is completely correct: it does not set the random seed properly. Based on my personal understanding, the seed should be called after all the processes have been "created" (or "spawned" if you may). Here, the mistake is that the random seed is set only on the main process. I am not 100% sure about my analysis tho, not sure if a call at line 24 of the example can set the seed to all the processes (a Python question). And unfortunately Lightning does not have good documentation for this (I raise an issue #3460) I believe that it is using pytorch Synbatchnorm. Check out the source code here |
Beta Was this translation helpful? Give feedback.
-
Hi @DKandrew , after reading the example, I think we should define our model with regular BatchNorm and then if we decide to use the option sync_batchnorm = true in Trainer then the framework will convert all those BatchNorm layer into SyncBatchNorm for us. I will test this in my code to see if it works like that. |
Beta Was this translation helpful? Give feedback.
-
Hi @phongnhhn92, from my personal experience, there is not much difference between Apex and PyTorch SyncBatchNorm and I vaguely remember that Apex developers have a close relationship with PyTorch's so their implementations may be fundamentally the same (don't quote me, please put a grant of salt on this). I have used nn.SyncBatchNorm for a while for semantic segmentation tasks and haven't encountered any issue so far, my network output is descent so I would say the PyTorch one is safe to use. |
Beta Was this translation helpful? Give feedback.
-
Does pytorch-lightning support synchronized batch normalization (SyncBN) when training with DDP? If so, how to use it?
If not, Apex has implemented SyncBN and one can use it with native PyTorch and Apex by:
How to use them under the pytorch-lightning scheme?
SyncBN makes a big difference when training the model with DDP and it would be great to know how to use it in pytorch-lightning.
Thanks!
Beta Was this translation helpful? Give feedback.
All reactions