Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: patch NDArrayOperatorsMixin #2534

Merged
merged 8 commits into from
Jun 21, 2023
Merged

Conversation

agoose77
Copy link
Collaborator

@agoose77 agoose77 commented Jun 20, 2023

Fixes #2533 by vendoring our own version of NDArrayOperatorsMixin

@codecov
Copy link

codecov bot commented Jun 20, 2023

Codecov Report

Merging #2534 (fe81bf2) into main (f818083) will increase coverage by 0.04%.
The diff coverage is 95.83%.

Additional details and impacted files
Impacted Files Coverage Δ
src/awkward/_util.py 90.00% <ø> (-1.08%) ⬇️
src/awkward/_operators.py 94.82% <94.82%> (ø)
src/awkward/_connect/numexpr.py 90.41% <100.00%> (+0.13%) ⬆️
src/awkward/_connect/numpy.py 94.53% <100.00%> (+0.03%) ⬆️
src/awkward/_connect/pyarrow.py 91.22% <100.00%> (+0.01%) ⬆️
src/awkward/_nplikes/numpy.py 65.45% <100.00%> (ø)
src/awkward/_nplikes/typetracer.py 76.65% <100.00%> (+0.37%) ⬆️
src/awkward/highlevel.py 76.82% <100.00%> (ø)
src/awkward/numba.py 97.01% <100.00%> (+0.02%) ⬆️

@agoose77 agoose77 temporarily deployed to docs-preview June 20, 2023 15:07 — with GitHub Actions Inactive
@agoose77 agoose77 temporarily deployed to docs-preview June 20, 2023 15:42 — with GitHub Actions Inactive
Copy link
Member

@jpivarski jpivarski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would NumPy introduce __slots__ on a mixin? (Mixins are supposed to not be in charge of what data a class has, and therefore not define the __init__ and attributes. Defining __slots__ forces an exact set of data attributes, and therefore make the class not a mixin. #2533 is exactly the reason why mixins are okay for multiple inheritance and classes that define data are not.)

Since we have our own definition of NDArrayOperatorsMixin,

try:
NDArrayOperatorsMixin = numpy.lib.mixins.NDArrayOperatorsMixin
except AttributeError:
from numpy.core import umath as um
def _disables_array_ufunc(obj):
try:
return obj.__array_ufunc__ is None
except AttributeError:
return False
def _binary_method(ufunc, name):
def func(self, other):
if _disables_array_ufunc(other):
return NotImplemented
return ufunc(self, other)
func.__name__ = f"__{name}__"
return func
def _reflected_binary_method(ufunc, name):
def func(self, other):
if _disables_array_ufunc(other):
return NotImplemented
return ufunc(other, self)
func.__name__ = f"__r{name}__"
return func
def _inplace_binary_method(ufunc, name):
def func(self, other):
return ufunc(self, other, out=(self,))
func.__name__ = f"__i{name}__"
return func
def _numeric_methods(ufunc, name):
return (
_binary_method(ufunc, name),
_reflected_binary_method(ufunc, name),
_inplace_binary_method(ufunc, name),
)
def _unary_method(ufunc, name):
def func(self):
return ufunc(self)
func.__name__ = f"__{name}__"
return func
class NDArrayOperatorsMixin:
__lt__ = _binary_method(um.less, "lt")
__le__ = _binary_method(um.less_equal, "le")
__eq__ = _binary_method(um.equal, "eq")
__ne__ = _binary_method(um.not_equal, "ne")
__gt__ = _binary_method(um.greater, "gt")
__ge__ = _binary_method(um.greater_equal, "ge")
__add__, __radd__, __iadd__ = _numeric_methods(um.add, "add")
__sub__, __rsub__, __isub__ = _numeric_methods(um.subtract, "sub")
__mul__, __rmul__, __imul__ = _numeric_methods(um.multiply, "mul")
__matmul__, __rmatmul__, __imatmul__ = _numeric_methods(um.matmul, "matmul")
__truediv__, __rtruediv__, __itruediv__ = _numeric_methods(
um.true_divide, "truediv"
)
__floordiv__, __rfloordiv__, __ifloordiv__ = _numeric_methods(
um.floor_divide, "floordiv"
)
__mod__, __rmod__, __imod__ = _numeric_methods(um.remainder, "mod")
if hasattr(um, "divmod"):
__divmod__ = _binary_method(um.divmod, "divmod")
__rdivmod__ = _reflected_binary_method(um.divmod, "divmod")
__pow__, __rpow__, __ipow__ = _numeric_methods(um.power, "pow")
__lshift__, __rlshift__, __ilshift__ = _numeric_methods(um.left_shift, "lshift")
__rshift__, __rrshift__, __irshift__ = _numeric_methods(
um.right_shift, "rshift"
)
__and__, __rand__, __iand__ = _numeric_methods(um.bitwise_and, "and")
__xor__, __rxor__, __ixor__ = _numeric_methods(um.bitwise_xor, "xor")
__or__, __ror__, __ior__ = _numeric_methods(um.bitwise_or, "or")
__neg__ = _unary_method(um.negative, "neg")
if hasattr(um, "positive"):
__pos__ = _unary_method(um.positive, "pos")
__abs__ = _unary_method(um.absolute, "abs")
__invert__ = _unary_method(um.invert, "invert")

(though it was removed in #2115), I'd prefer to just use our definition and therefore control it. It means that we'll need to be on top of any new additional unary/binary operators (like @), but we don't have to be on the defensive about upstream technicalities like this.

@douglasdavis
Copy link
Contributor

numpy issue numpy/numpy#22876 and PR numpy/numpy#23113

@jpivarski
Copy link
Member

Oh, so my reading of this is that

  • NumPy only introduced __slots__ = ()
  • mixins without __slots__ are not mixable into classes with __slots__ (worse: Python just ignores the subclassed __slots__)
  • a superclass/mixin with __slots__ = () is unrestrictive about what slots downstream classes define.

By that logic, the NumPy developers concluded that there could be no downside to it. In #2533, we seem to have discovered a downside: __class__ can't be assigned to by other classes that don't define __slots__ because they have a different memory model.

Maybe a different solution to this is to define __slots__ for ak.Array and ak.Record? That would put them in the same memory model as NDArrayOperatorsMixin-with-__slots__. If we do that, would that require downstream libraries from Awkward (i.e. Coffea and Vector) to also define __slots__ in their behavior classes?

We don't want an infectious implementation requirement to spread to all downstream libraries that use Awkward, especially if the error message is cryptic, like

TypeError: __class__ assignment: 'MyAwesomeBehavior' object layout differs from 'NDArrayOperatorsMixin'

If defining __slots__ for ak.Array and ak.Record infects downstream libraries with the need to define __slots__ on their behaviors, then let's define our own NDArrayOperatorsMixin without __slots__. (At worst, downstream libraries will try to use __slots__ and Python will ignore their performance hints.)

If defining __slots__ for ak.Array and ak.Record has no implications for downstream libraries, let's define __slots__ for ak.Array and ak.Record.

@agoose77
Copy link
Collaborator Author

The way that the layout checker works is that it has a notion of type-supertype "compatibility", and it attempts to find a compatible common base.

To generate our mixin classes for both records and arrays, we do something like

class ArrayImpl(MixinType, ak.Array):
    ...

The logic checking type-supertype compatibility for multiple inheritance walks down the tp_base (or __base__ in Python) slot, and checks whether it is "compatible" with the type. If the child has a __dict__ and the parent does not, this is False.

Let's imagine that we add slots to ak.Array and the generated classes for records and arrays. If the user doesn't define __slots__, we'll end up with a dict, and the compatibility check fails. Conversely, without __slots__, the base for the mixin (ArrayImpl) is object, whilst for ak.Array the base is NDArrayOperatorsMixin, which also fails.

Of the two approaches, it's probably safer to assume that the user doesn't define __slots__ for these mixin classes, because doing so would currently fail:

import awkward as ak


behavior = {}


class Point:
    __slots__ = ()


class MyPoint(Point, ak.Array):
    ...


behavior["*", "Point"] = MyPoint

ak.operations.zip(
    {
        "x": ak.Array([1, 1]),
        "y": ak.Array([1, 1]),
    },
    with_name="Point",
    behavior=behavior,
)

For the same reasons outlined above: MyPoint.__base__ is Point, which is the final "compatible base" for MyPoint, and is not the same base as that of ak.Array.

I think, therefore, the safest immediate step is to drop the slots, by using our own mixin. For our ecosystem, I think there's lesser benefit to __slots__ usage; we don't expect high-frequency usage of array attributes to dominate over the array kernels. I might be wrong, though.

@agoose77 agoose77 requested a review from jpivarski June 20, 2023 21:32
@agoose77 agoose77 temporarily deployed to docs-preview June 20, 2023 21:43 — with GitHub Actions Inactive
@agoose77 agoose77 temporarily deployed to docs-preview June 20, 2023 23:13 — with GitHub Actions Inactive
@jpivarski
Copy link
Member

I think, therefore, the safest immediate step is to drop the slots, by using our own mixin. For our ecosystem, I think there's lesser benefit to __slots__ usage; we don't expect high-frequency usage of array attributes to dominate over the array kernels. I might be wrong, though.

Let's go with that. I was also thinking that __slots__ aren't going to help much for ak.Array, considering that it contains so many large, nested, tree-like objects.

@agoose77
Copy link
Collaborator Author

I've taken your above comment as approval for merging!

@agoose77 agoose77 merged commit 11e61b6 into main Jun 21, 2023
@agoose77 agoose77 deleted the agoose77/fix-patch-nd-operators-mixin branch June 21, 2023 09:03
agoose77 added a commit that referenced this pull request Jul 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

__class__ assignment fails with mixin_class decorator
3 participants