Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DiskArrays.jl vs BlockArrays.jl #23

Open
Luapulu opened this issue Nov 6, 2020 · 1 comment
Open

DiskArrays.jl vs BlockArrays.jl #23

Luapulu opened this issue Nov 6, 2020 · 1 comment

Comments

@Luapulu
Copy link

Luapulu commented Nov 6, 2020

Hey,

this is probably old news for you, but I just discovered BlockArrays.jl. If I understand correctly, BlockArrays.jl implements a lot of, if not all the features we need for DiskArrays.jl.

Does this make DiskArrrays.jl obsolote?

Sorry, if I'm going through the same thought process you already covered long ago, but may this can help.

@meggart
Copy link
Owner

meggart commented Nov 9, 2020

I think DiskArrays and BlockArrays started with very different goals in mind although in the end resulted with partly similar design.

Originally DiskArrays mission was only to get indexing right for a variety of packages that implement AbstractArray behavior for arrays that are mapped to disk (see this thread https://discourse.julialang.org/t/taking-the-array-indexing-interface-seriously/32035). So we started out to get these trailing/missing indices unified so that not every package had to implement its own complicated getindex/setindex. Actually this should not be a problem because the AbstractArray interface should take care of this, but unfortunately it the interface assumes low-latency random access to the array which is not true for these arrays.

Notice there was not a notion of chunks so far in the package. Only when we wanted to go a bit further and implement some mapreduce/broadcast behavior it became necessary to think about chunks. Here I would have loved to use some kind of interface that an array could implement (e.g. what I proposed in ChunkedArrayBase) but did not find something that was general enough and had out-of-core data in mind. I experimented with BlockArrays, but found that its scope was quite different from what we do in DiskArrays.

Note that last time I looked BlockArrays it could not really deal with out-of-core data and was more or less still assuming fast random access through caching or through holding arrays in memory. I understood its main purpose was to support very sparse arrays by just defining the blocks that actually hold some data (banded matrices). Would be good to know if it has been extended now to support our use cases, but I would still be sceptical.

So, I would be very surprised if we could simply make an HDF5Dataset an AbstractBlockArray and would have fast indexing and broadcasting etc, but maybe it would work. However, if you want to try to reuse some of the types like BlockIndex and some iteration over Blocks, I think this would be a good idea.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants