Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stack overflow for concatenation of large number of arrays #97

Open
baggepinnen opened this issue Mar 6, 2020 · 7 comments
Open

Stack overflow for concatenation of large number of arrays #97

baggepinnen opened this issue Mar 6, 2020 · 7 comments

Comments

@baggepinnen
Copy link

The current way of constructing Vcat objects, by splatting/slurping, causes stack overflow when concatenating a large number of arrays since special methods of these functions are compiled for the particular number of input arguments.

My usecase is LazyWAVFiles.jl which treats a folder of WAV files as one large array, such a folder may contain quite a lot of files...

@baggepinnen
Copy link
Author

I found a way to create the lazy array such that the args tuple got the type NTuple using

ApplyArray{T, N, typeof(vcat), NTuple{length(arrays),L}}(vcat, NTuple{length(arrays),L}(ntuple(i->arrays[i],length(arrays))))

@dlfivefifty
Copy link
Member

Would you be able to make a PR?

@baggepinnen
Copy link
Author

I could. Would the desired signature be something like

Vcat(::AbstractArray{AbstractArray})

containing essentially

ApplyArray{T, N, typeof(vcat), NTuple{length(arrays),L}}(vcat, NTuple{length(arrays),L}(ntuple(i->arrays[i],length(arrays))))

?

@dlfivefifty
Copy link
Member

Vcat(A) is short for ApplyArray(vcat,A), and ApplyArray(f,A) is a lazy version of f(A).

So your proposed signature is not good as it should return something equivalent to vcat(::AbstractArray{<:AbstractArray}), that is have the following behaviour:

julia> vcat([[1,2],[3,4]])
2-element Array{Array{Int64,1},1}:
 [1, 2]
 [3, 4]

Actually, I don't think you want Vcat as if you have a large number of arrays you don't want to work with tuples. The best alternative is probably mortar from BlockArrays.jl:

julia> mortar([[1,2],[3,4]])
2-blocked 4-element BlockArray{Int64,1}:
 1
 23
 4

Note this is already lazy (in the sense that it doesn't copy its arguments).

@baggepinnen
Copy link
Author

Interesting, I was not aware of mortar. My use of LazyArrays to this purpose is working fine though and with the workaround above I have good performance even for very many arrays. I guess we can close this issue, and maybe add a line to the docs hinting at mortar for future users of large numbers of arrays.

@dlfivefifty
Copy link
Member

Perhaps you could add your work around to overload:

Vcat(A::AbstractArray...)

?

@baggepinnen
Copy link
Author

Methods using varargs are also troublesome if the number of arrays is large since the compiler will hit some limit for the number of arguments to a function. Overloading it does not help, the splatting of the vector of vectors must be avoided.
I guess if anyone runs into this issue, they will find this discussion and can make use of the workaround above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants