Fix mean reduction in cross entropy loss

The mean reduction should reduce the s_loss to a scalar. Also, I'm not sure why division was being used here instead of multiplication by the mask, but I changed it to multiplication.
allenai · Mar 24, 2024 · a57f380 · a57f380
1 parent 8949bd8
commit a57f380
Showing 1 changed file with 1 addition and 1 deletion.
diff --git a/olmo/train.py b/olmo/train.py
@@ -105,7 +105,7 @@ def cross_entropy_loss(
 
     z_squared = logits.logsumexp(-1).pow(2)
     if reduction == "mean":
-        z_squared = z_squared / (labels != ignore_index).mean()
+        z_squared = (z_squared * (labels != ignore_index)).mean()
     elif reduction == "sum":
         z_squared = (z_squared * (labels != ignore_index)).sum()