Revert "mul: remove opmath cast sequence (#9663)" (#9701)

rajkthakur · web-flow · commit 66f885955863 · 2025-11-11T16:48:28.000-08:00
Commit 2a9138a removed `.use_opmathtype_for_compute()` from element-wise 'mul' operation, this breaks mixed-precision accumulation behavior expected by the Neuron compiler that traces/compile on CPU and later execute the binary on neuron hardwares, causing accuracy degradation transformer models using mixed-precision compilation Reverts: commit 2a9138a, other changes are result of rebase from r2.9 Fixes: Model accuracy failures with mixed-precision accumulation #9699
diff --git a/test/test_operations_hlo.py b/test/test_operations_hlo.py
@@ -67,22 +67,6 @@ def test_dropout_by_u8_mask(self):
     hlo_text = torch_xla._XLAC._get_xla_tensors_hlo([b])
     assert 'u8' in hlo_text
 
-  def test_bfloat16_mul_not_upcast(self):
-    a = torch.rand(5, 5, dtype=torch.bfloat16).to('xla')
-    b = torch.rand(5, 5, dtype=torch.bfloat16).to('xla')
-    c = a * b
-    hlo_text = torch_xla._XLAC._get_xla_tensors_hlo([c])
-    # Check that the output is not upcasted to float32
-    assert 'f32' not in hlo_text
-
-  def test_bfloat16_float32_mul_upcast(self):
-    a = torch.rand(5, 5, dtype=torch.bfloat16).to('xla')
-    b = torch.rand(5, 5, dtype=torch.float32).to('xla')
-    c = a * b
-    hlo_text = torch_xla._XLAC._get_xla_tensors_hlo([c])
-    # Check that the output is upcasted to float32
-    assert 'f32' in hlo_text
-
 
 if __name__ == '__main__':
   torch.set_default_dtype(torch.float32)
diff --git a/torch_xla/csrc/aten_xla_type.cpp b/torch_xla/csrc/aten_xla_type.cpp
@@ -2535,6 +2535,7 @@ at::Tensor XLANativeFunctions::mul(const at::Tensor& self,
       .add_input(self)
       .add_input(other)
       .cast_inputs_to_common_dtype()
+      .use_opmathtype_for_compute()
       .run();
 }
 

Original file line number	Diff line number	Diff line change
`@@ -2535,6 +2535,7 @@ at::Tensor XLANativeFunctions::mul(const at::Tensor& self,`
`2535`	`2535`	`.add_input(self)`
`2536`	`2536`	`.add_input(other)`
`2537`	`2537`	`.cast_inputs_to_common_dtype()`
	`2538`	`+ .use_opmathtype_for_compute()`
`2538`	`2539`	`.run();`
`2539`	`2540`	`}`
`2540`	`2541`