Open
Description
Module has an accGradInput method, which accumulates grad into a tensor. So we need to zero that tensor at the beginning of each iteration.
This is not necessary if we only accumulate once in each iteration. We can provide a mode flag. If the user set that flag, the accumulating gradient will become overwrite the tensor, which can save the zeroGrad time in the training.