You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Oct 31, 2023. It is now read-only.
Hi!
I'm training a network with two separate head (something like Hydranet).
How should I deal with non-scalar losses?
With standard pytorch backward process I'm just feeding backward() with
The "vector" in the Jacobian-vector product, usually gradients w.r.t. each element of corresponding tensors.
loss_seq = [loss_head_1, loss_head_2]
grad_seq = [torch.tensor(1.0).cuda(device) for _ in range(len(loss_seq))]
torch.autograd.backward(loss_seq, grad_seq)
Is it possible to handle this scenario with higher? What should I pass to the diffopt.step()? Is it enough to invoke diffopt.step(loss_seq)?
Thanks for help in advance!
The text was updated successfully, but these errors were encountered:
If you have multiple loss terms, you can just add them together to obtain a single scalar and then call diffopt.step(.). It is equivalent to backpropagating through each loss term separately. Just note that the gradients for the shared modules in the model will be aggregated, which is the default PyTorch behavior.
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Hi!
I'm training a network with two separate head (something like Hydranet).
How should I deal with non-scalar losses?
With standard pytorch backward process I'm just feeding
backward()
withIs it possible to handle this scenario with
higher
? What should I pass to thediffopt.step()
? Is it enough to invokediffopt.step(loss_seq)
?Thanks for help in advance!
The text was updated successfully, but these errors were encountered: