DiskArrays.jl vs BlockArrays.jl #23

Luapulu · 2020-11-06T20:00:35Z

Hey,

this is probably old news for you, but I just discovered BlockArrays.jl. If I understand correctly, BlockArrays.jl implements a lot of, if not all the features we need for DiskArrays.jl.

Does this make DiskArrrays.jl obsolote?

Sorry, if I'm going through the same thought process you already covered long ago, but may this can help.

meggart · 2020-11-09T20:32:35Z

I think DiskArrays and BlockArrays started with very different goals in mind although in the end resulted with partly similar design.

Originally DiskArrays mission was only to get indexing right for a variety of packages that implement AbstractArray behavior for arrays that are mapped to disk (see this thread https://discourse.julialang.org/t/taking-the-array-indexing-interface-seriously/32035). So we started out to get these trailing/missing indices unified so that not every package had to implement its own complicated getindex/setindex. Actually this should not be a problem because the AbstractArray interface should take care of this, but unfortunately it the interface assumes low-latency random access to the array which is not true for these arrays.

Notice there was not a notion of chunks so far in the package. Only when we wanted to go a bit further and implement some mapreduce/broadcast behavior it became necessary to think about chunks. Here I would have loved to use some kind of interface that an array could implement (e.g. what I proposed in ChunkedArrayBase) but did not find something that was general enough and had out-of-core data in mind. I experimented with BlockArrays, but found that its scope was quite different from what we do in DiskArrays.

Note that last time I looked BlockArrays it could not really deal with out-of-core data and was more or less still assuming fast random access through caching or through holding arrays in memory. I understood its main purpose was to support very sparse arrays by just defining the blocks that actually hold some data (banded matrices). Would be good to know if it has been extended now to support our use cases, but I would still be sceptical.

So, I would be very surprised if we could simply make an HDF5Dataset an AbstractBlockArray and would have fast indexing and broadcasting etc, but maybe it would work. However, if you want to try to reuse some of the types like BlockIndex and some iteration over Blocks, I think this would be a good idea.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DiskArrays.jl vs BlockArrays.jl #23

DiskArrays.jl vs BlockArrays.jl #23

Luapulu commented Nov 6, 2020

meggart commented Nov 9, 2020

DiskArrays.jl vs BlockArrays.jl #23

DiskArrays.jl vs BlockArrays.jl #23

Comments

Luapulu commented Nov 6, 2020

meggart commented Nov 9, 2020