Non scalar loss #134

janglinko-dac · 2022-12-14T15:12:25Z

Hi!
I'm training a network with two separate head (something like Hydranet).
How should I deal with non-scalar losses?
With standard pytorch backward process I'm just feeding backward() with

The "vector" in the Jacobian-vector product, usually gradients w.r.t. each element of corresponding tensors.

loss_seq = [loss_head_1, loss_head_2]
grad_seq = [torch.tensor(1.0).cuda(device) for _ in range(len(loss_seq))]
torch.autograd.backward(loss_seq, grad_seq)

Is it possible to handle this scenario with higher? What should I pass to the diffopt.step()? Is it enough to invoke diffopt.step(loss_seq)?

Thanks for help in advance!

The text was updated successfully, but these errors were encountered:

HamedHemati · 2022-12-14T18:59:19Z

If you have multiple loss terms, you can just add them together to obtain a single scalar and then call diffopt.step(.). It is equivalent to backpropagating through each loss term separately. Just note that the gradients for the shared modules in the model will be aggregated, which is the default PyTorch behavior.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Non scalar loss #134

Non scalar loss #134

janglinko-dac commented Dec 14, 2022

HamedHemati commented Dec 14, 2022

Non scalar loss #134

Non scalar loss #134

Comments

janglinko-dac commented Dec 14, 2022

HamedHemati commented Dec 14, 2022