-
Notifications
You must be signed in to change notification settings - Fork 480
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Forward XLATensorImpl::is_contiguous_custom
to TensorImpl
.
#8032
base: master
Are you sure you want to change the base?
Conversation
test failure seems real? |
yea |
@@ -1068,7 +1068,7 @@ def test_backward_optimization_barrier(self): | |||
|
|||
hlo = torch_xla._XLAC._get_xla_tensors_hlo([model.fc2.weight.grad]) | |||
self.assertIn( | |||
'%opt-barrier.37 = (f32[1,64]{0,1}, f32[1]{0}, f32[2,64]{1,0}) opt-barrier((f32[1,64]{0,1}, f32[1]{0}, f32[2,64]{1,0}) %tuple.36)', | |||
'%opt-barrier.38 = (f32[1,64]{1,0}, f32[1]{0}, f32[2,64]{1,0}) opt-barrier((f32[1,64]{1,0}, f32[1]{0}, f32[2,64]{1,0}) %tuple.37)', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was needed for fixing a CI failure in pytorch/pytorch#135237. @JackCaoG let me know if you think this should not be happening.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes in this file are needed because it seems that PyTorch is running these tests with Python 3.8. Thus, the subscript operator is not allowed for types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hopefully pytorch/pytorch#135278 can be merged and we don't have this issue anymore in the future...
@JackCaoG do we want to add this PR to 2.5 release? (knowing it has some upstream dependencies to consider - @ysiraichi please reference the dependencies for clarity) |
no, I don't want to add features to 2.5 releases at this point. |
// If functionalization is disabled, the tensors' metadata aren't being | ||
// updated w.r.t. the output of meta functions. Therefore, we fallback to the | ||
// old behavior returning true, always. | ||
if (runtime::sys_util::GetEnvBool("XLA_DISABLE_FUNCTIONALIZATION", false)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe just add a IsFunctionalizationDisabled
(use a static bool to store the value of this env var) in https://github.com/pytorch/xla/blob/ea8c47a345a29c6a1b1bdf4ee38a9159b07a980f/torch_xla/csrc/xla_graph_executor.h or in tensor.h
instead of keep getting the from the env var
I don't think the approach in this PR will be enough for propagating the metadata. I think I will have to modify the |
would be good to measure those overheads, my experience is that most of the C++ ops are pretty fast except creating large objects and calculating hash. It might be not too bad in this case too. |
The reason why I'm asking this is because of #7923. Basically, what I'm saying is that we will have to call the meta functions of every operation that goes through the functional fallback function (e.g. @JackCaoG what do you think? |
I am already concern about the python meta function overhead we have today so ideally I don't want to introduce more. Do you mind give it a try and see how many tracing time overhad does it introduce? |
@ysiraichi do we know what else is remaining for this PR to become ready for merging? |
Before merging this branch, we need to first merge its PyTorch counterpart pytorch/pytorch#135498. In order to merge that one, we need to propagate the Tensor metadata for all operations: manually written kernels and codegen'd ones. Note that this might break previous assumptions that XLA tensors |
This PR fixes #7998. Instead of always returning true, we forward this call to the base class
TensorImpl::is_contiguous_custom()
.The reason is that after pytorch/pytorch#135498 is merged, XLA tensors' metadata might stop reflecting on the actual XLA storage. Which means that the tensors' strides might not always be contiguous. Whenever that happens,
tensor.is_contiguous()
call should be consistent with the tensors' strides.cc @miladm @JackCaoG @alanwaketan