-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
0-dimensional tensor not composable #770
Comments
it would be possible to change the behavior of getitem to distribute the single item to all processes. typically, when the shape is reduced to a single value along a dimension the split dimension is reduced by one. perhaps in the case that it is reduced to a single value the value could be distributed to split=None. what do you think @ClaudiaComito ? |
Thanks @fschlimb and @coquelin77, indeed I already implemented this in #758, i.e. with @fschlimb please do let me know if you have more examples that don't work on the indexing branch, it would be great to have this issue settled. Thanks a lot! |
@ClaudiaComito Would it be possible to do the data-replication for 0d-tensors on-demand/lazily? |
@fschlimb I've been racking my brain about this (@Markus-Goetz this is what I wanted to chat about this morning). It is very inconvenient to broadcast the data back and forth every time we slice an element. But operations on We could add a So , something like:
(no data on ranks > 0) or
after the first operation has triggered the Bcast. Thoughts? @coquelin77 @Markus-Goetz , also @ben-bou |
@ClaudiaComito Yes, that's what I meant.
Not sure I follow the examples. Could you please elaborate? I think in my example above the 0d array should be
I am currently working on an experiment which would be based on the latter mechanism (converting to scalar). For that however, I need to know which rank owns the value. I know this is not necessarily a common use case, but maybe it would be good to have both, the automatic broadcast or conversion, as well as the ability to retrieve the owner of the value. |
I don't understand how delaying the Bcast ("replication") until the first evaluation would be advantageous. I think this would only be the case if there is no replication to all processes happening but rather the empty processes can "request" the data whenever they need it. How would this be implemented, would the root process have to spawn a "listener" to answer the requests? How can the root process guarantee that the data has not been changed locally? Alternatively, and I guess this is what is meant here, this case could be seen as an extremely unbalanced dndarrray, and then be redistributed when needed (most notably in the The encountered problems could probably be circumvented by introducing the
|
Hi @fschlimb, I meant the MPI Bcast, sorry for the confusion. I think we're mixing things up a bit, so let me take one step back. The metadata of a DNDarray have to be consistent with its global shape and local shapes of the process-local tensors, among other things. If I slice one element off along the split axis, I also "squeeze" that axis so I lose the split dimension: >>> a = ht.ones((5, 6), split=0)
>>> a.split
0
>>> a.lshape
[0/2] (3, 6)
[1/2] (2, 6)
>>> a[0].lshape # this step involves MPI_Bcast from rank 0!
[0/2] (6,)
[1/2] (6,)
>>> a[0].split
None The communication step (MPI_Bcast) is what I understood you'd like to have a lazy implementation of. Are we on the same page up to here? Your example above works under #758 because the data ( |
@ben-bou Yes, the replication might not be needed. It's a common case when developers use explicit loops and explicit array indexing. Imagine a loop like this:
Any trivial split will require that over time all data points exist on all ranks. At a certain point in time a full replication of all concurrently used 0d arrays is not desired, though. Of course I do not expect HeAT to perform well on this. I am trying to experiment with an optimizer to address such issues. Conceptually, there is no need to have a "listener" since every use of a DNDarray is 'collective', so in theory the owner and the consumer both know that the data is needed by the consumer. |
@ClaudiaComito Yes I know.
Yep, sorry, I mentioned the broadcasting semantics only to clarify that depending on how the 0d array is used in a specific situation different implementations might be considered. One of the options would be to physically expand the 0d tensor according to numpy's broadcasting rules. MPI_bcast is another and send/recv a third.
Maybe in the general case. Thanks for the explanation, I know better see the requirements on generality and consistency within HeAT. Right now I am really focused on 0d tensors (scalars). Let me continue my dry-prototyping (see above) so I better understand how my stuff/ask would fit into HeAT overall. I'll get back to you when I can formulate my request clearer. Can you suggest a good way to determine the 'home' of the value in a 0d tensor (if not replicated, e.g. on main branch)?
Nice! |
@fschlimb Thanks, I also understand your problem better now.
|
Yes, I had tried that, what I get is this (on all ranks):
How does this tell me where the value is? |
Ouch. Just out of curiosity, could you try the same on this branch? Just trying to figure out if it's a problem I fixed already, or a new one. Thanks a lot for bringing this topic up! |
Yep, same result. |
Actually, create_lshape_map() returns the correct result, as As an example
In this case you would know that the sliced element is on rank 0, and rank 1 is "empty". But indeed, if you have a scalar, you can't recover the rank via the lshape map. Indeed we would have to add this info to the metadata. For the time being you can calculate it though, check out |
I think that this would be possible, but it would require a special bit of trickery. I think that the best way to do it would be to hide this from the user. The easiest would be to wait until it is called. In my head, im picturing this as a class which would call the |
i have addressed the |
I modified the behavior to show a also, I think that we should distribute the result if we know that the output shape is a constant, regardless of the key type for the getitem |
@ClaudiaComito ive changed a few things in the getitem function. now if there is only a single element as the result then it will bcast it for both int and slices. however i did not do this for advanced indexing. EDIT: after a small amount of testing, it looks like advanced indexing does what its supposed to already. |
A simple program like
gives an error message. Re-balancing the 0-dimensional tensor
a[0]
as suggested in a warning is neither useful nor possible (same error).I know I can cast it to a scalar, but that's not very convenient if I want to implement a generic feature.
What is needed to make the above work?
The text was updated successfully, but these errors were encountered: