Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

generalize groupby and mapmany types #54

Closed
wants to merge 3 commits into from
Closed

generalize groupby and mapmany types #54

wants to merge 3 commits into from

Conversation

aplavin
Copy link
Collaborator

@aplavin aplavin commented Jul 1, 2022

Use similar() to create proper array types instead of plain Vectors.

Also, a (minor?) fix to group().

@aplavin aplavin requested a review from andyferris July 2, 2022 13:14
Copy link
Member

@andyferris andyferris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little unsure about this.

Many types of arrays exist that support similar but not push! or append!. The similar interface guarantees that setindex! works (otherwise it would be useless) and base uses emptymutable to create arrays that can be resized (again, because otherwise it would be useless; note occassionally you do want to use empty in a "purely functional" style but never similar)

Is there some problem we are trying to solve with similar? Is it that some array types in the wild don't support emptymutable when they should?

@aplavin
Copy link
Collaborator Author

aplavin commented Jul 31, 2022

Many types of arrays exist that support similar but not push! or append!.

Are there actually types that return non-pushable arrays from similar? For example, both Base ranges and StaticArrays SVectors turn into plain Vectors:

julia> similar(1:5, 1)
1-element Vector{Int64}:
 0

julia> similar(SVector(1, 2), 1)
1-element Vector{Int64}:
 4

Is there some problem we are trying to solve with similar?

An example is shown in tests: with this PR, group and mapmany on StructArrays return StructArrays. Before, the result was a plain Vector.

Is it that some array types in the wild don't support emptymutable when they should?

emptymutable doesn't seem documented at all, and barely any package define methods for it (according to juliahub). So, emptymutable returns a plain Vector for basically all array types.
For me, it looks like some internal function for Base, unlike similar.

@andyferris
Copy link
Member

For example, both Base ranges and StaticArrays SVectors turn into plain Vectors

There's mitigating facts for those. Ranges don't support setindex! and cannot be the output of similar (or emptymutable). For SVector we can't tell if the size is statically-known integer or a runtime integer, and we don't want to pessimize generic code that uses similar but has never thought about StaticArrays by assuming the size is statically known (and therefore making the runtime case way slower than it needs to be). Of course a static array could never support emptymutable.

with this PR, group and mapmany on StructArrays return StructArrays

See JuliaArrays/StructArrays.jl#238

emptymutable doesn't seem documented at all, and barely any package define methods for it (according to juliahub). So, emptymutable returns a plain Vector for basically all array types.
For me, it looks like some internal function for Base, unlike similar.

I think this is a fair observation from the outside but I can't see that this was the intention. I raised JuliaLang/julia#46230 - perhaps we can look at improving the situation.

@aplavin
Copy link
Collaborator Author

aplavin commented Jul 31, 2022

Still, are there any actual types whose similar output cannot be pushed to?

emptymutable would be more useful if it also allowed something else, something that cannot be just as easily done with similar.
A pain point currently is that there's no way to create a compatible empty array from a type: both similar and emptymutable take values, not types.
Due to lack of such a function,

  • reduce(vcat, Vector{Int}[]) fails, while it could just return Int[].
  • stack() proposed for inclusion in Julia returns different types for empty and non-empty inputs: Add stack(array_of_arrays) JuliaLang/julia#43334 (comment)
  • and actually mapmany in this PR also fails on empty input, just like reduce(vcat): it cannot construct similar without an example

Do you think something like emptymutable(T) is possible to achieve together with making it public API?

@andyferris
Copy link
Member

andyferris commented Jul 31, 2022

Still, are there any actual types whose similar output cannot be pushed to?

Off the top of my head, DistributedArrays.jl DArray has this property. I've created more than one like this myself. The easiest example to imagine is a C-style array - no double indirection like C++ std::vector or Julia's Vector to allow for resizing. (I'm hoping one day we'll have simpler Buffer in core and we can build a non-resizable array with no indirection out of that.)

A pain point currently is that there's no way to create a compatible empty array from a type

Yeah I agree with all of that. I'd like to evolve these. I took a stab at this problem in Dictionaries.jl - look for similar_type and empty_type. Feedback and contributions welcome. If the concept has merit, it would be good to look at improving Base.

@andyferris andyferris closed this Jul 31, 2022
@andyferris andyferris reopened this Aug 3, 2022
@aplavin aplavin closed this by deleting the head repository Sep 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants