Why do we need a separate optimizer for ArcFace Loss? #533

vishhvak · 2022-10-04T12:34:58Z

vishhvak
Oct 4, 2022

Hi, I wanted to understand why we require a separate optimizer for ArcFace specifically (and maybe other losses if the reason is generic) and is there code in the repo that corresponds to the use of a separate optimizer with these losses?

KevinMusgrave · 2022-10-04T14:02:16Z

KevinMusgrave
Oct 4, 2022
Maintainer

ArcFace contains parameters that need to be optimized. The parameters are used to convert embeddings into class predictions.

You can use the same optimizer for your model and for the loss function if you want:

optimizer = torch.optim.SGD(model.parameters() + loss_fn.parameters(), lr = 0.01)

# then during training:
loss = loss_fn(embeddings, labels)
optimizer.zero_grad()
loss.backward()
optimizer.step()

ArcFace is a classification loss like torch.nn.CrossEntropyLoss. The difference is in the organization.

A typical classification loss is applied like this: embeddings -> logits -> CrossEntropyLoss
Whereas ArcFace is applied like this: embeddings -> ArcFaceLoss

So the part that creates logits is contained inside ArcFaceLoss, and the weights of that layer need to be optimized.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why do we need a separate optimizer for ArcFace Loss? #533

{{title}}

Replies: 1 comment

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Why do we need a separate optimizer for ArcFace Loss? #533

vishhvak Oct 4, 2022

Replies: 1 comment

KevinMusgrave Oct 4, 2022 Maintainer

vishhvak
Oct 4, 2022

KevinMusgrave
Oct 4, 2022
Maintainer