Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Array tags are immutable views, not editable ndarrays #152

Open
Tracked by #156
MestreLion opened this issue Oct 29, 2021 · 9 comments
Open
Tracked by #156

Array tags are immutable views, not editable ndarrays #152

MestreLion opened this issue Oct 29, 2021 · 9 comments

Comments

@MestreLion
Copy link
Contributor

Any particular reason for creating them as read-only views? This forces clients to write the whole array to change a single value.

The conversion from ndarray t its view is done in this line and the following one.

@MestreLion
Copy link
Contributor Author

Speaking of arrays, any particular reason for return int.__new__(self.wrapper, super().__getitem__(index)) in this line instead of simply return self.wrapper(super().__getitem__(index)) ?

@MestreLion
Copy link
Contributor Author

MestreLion commented Oct 29, 2021

After a couple hours reading the official docs on sub-classing ndarray, I guess there are 2 different approaches to allow changing contents:

  • In Array.__new__, call super().__new__(cls, ...) instead of np.asarray(...).view(cls), as show in this example, then populate the values of the instance before returning it. I.e.: instead of a view, return a "normal" ndarray
  • Keep it as a view, and add a __setitem__ that invokes self.base.__setitem__, so it operates on the view's data source rather then on the (immutable) view, as (unintentionally) show in this other example

Both approaches would have to be carefully studied for eventual gotchas and corner cases, and there might have other (or better) approaches

@vberlier
Copy link
Owner

Hey! Thanks a lot for being so active lately! I'm busy with school stuff atm but I'm looking forward to dive into all this, hopefully next week or the week after. In the meantime keep'em coming :D

@MestreLion
Copy link
Contributor Author

No problem, take your time... I'll keep sending PRs and issues

As for this issue, take a look at the workaround currently needed to set items in an Array:

def apply_pixels(target: Map, pixels: t.Iterable[MapPixel]) -> None:
    arr = target.data['colors']
    # NBT Arrays might be implemented by the NBT backend as read-only ndarray views
    # If so, set the pixels in a (writeable) copy, then write back preserving original type
    cls = None
    if isinstance(arr, np.ndarray) and not arr.flags.writeable:
        cls = arr.__class__
        arr = arr.copy()
    for pixel in pixels:
        arr[pixel[0]] = pixel[1]
    if cls:
        target.data['colors'] = cls(arr)

(actual code from my map-deduper project)

@vberlier vberlier mentioned this issue Nov 2, 2021
9 tasks
@vberlier
Copy link
Owner

vberlier commented Nov 3, 2021

Yeah I remember looking into how to subclass ndarray and I think that's why I ended up with the view approach. But I agree that it's a problem. Before diving deeper into numpy stuff though, I kind of want to see what's possible with pure python lists in combination with mypyc (as mentioned #156). The array tags pretty much only use numpy for packing/unpacking and handling endianness efficiently, so there's a chance that mypyc can achieve comparable performance for this specific use-case. This would also mean that nbtlib could let go of numpy and any C extension for people using it in constrained environments by gracefully falling back to the pure python source used by mypyc.

@MestreLion
Copy link
Contributor Author

Good idea. If you go this route, use the Python's native bytearray for.. well, ByteArray ;-)

And maybe a simple Struct on bytearrays could also efficiently cover the case for Int and Long Arrays

@vberlier
Copy link
Owner

vberlier commented Nov 3, 2021

Originally byte array tags used the builtin bytearray but it was a bit slower than numpy due to the separate byteswap required when dealing with big-endian nbt. Also it was a bit more confusing to maintain because the int array tags needed to be implemented differently. But yeah I'm gonna revisit all that and benchmark.

@MestreLion
Copy link
Contributor Author

MestreLion commented Nov 3, 2021

Btw, I'm perfectly fine with ndarray for all Arrays, as long as:

  • It's mutable like any list
  • It inherits from MutableSequence, like list. I think it currently doesn't. Which is convenient for me, as currently I always want to handle them separately anyway (in pretty prints and NBT Explorer-like trees), but it's also wrong. Hey, those poor arrays are mutable sequences after all. They're just fixed length. If I want to segregate them, It's my prejudice on them for being so damn long when printing :-)

Wait... no. Being fixed length might change everything. They lack .append() and .extend(). Are those in MutableSequence's API? Anyway, let the ABC be dictated by whatever the raw NBT data qualifies for. And the underlying structure should reflect that ABC, be it a wrapper on ndarray or something else.

@ch-yx
Copy link

ch-yx commented Aug 21, 2023

Speaking of arrays, any particular reason for return int.__new__(self.wrapper, super().__getitem__(index)) in this line instead of simply return self.wrapper(super().__getitem__(index)) ?

i did some searching. and i found this.
#38 (comment)

it use int.__new__ because wrapper(x) would check if x OutOfRange or not.
(we dont need to test if the number get from the Array is in the range)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants