Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Broadcasting with Set returns an arbitrarily-ordered array #33282

Open
tkf opened this issue Sep 16, 2019 · 20 comments
Open

Broadcasting with Set returns an arbitrarily-ordered array #33282

tkf opened this issue Sep 16, 2019 · 20 comments
Labels
broadcast Applying a function over a collection collections Data structures holding multiple items, e.g. sets

Comments

@tkf
Copy link
Member

tkf commented Sep 16, 2019

Currently (Julia 1.4-DEV), broadcasting with Set produces a result that relies on iteration order of Set which is not meaningful:

julia> Set([1,2]) .+ Set([3,4])
2-element Array{Int64,1}:
 6
 4

This is due to the fallback to collect in broadcastable definition:

broadcastable(x) = collect(x)

A sensible definition may be to use set arithmetic (ref: JuliaMath/IntervalSets.jl#55) such that, for example,

op.(set1, set2) == Set([op(x, y) for x in set1, y in set2])

holds. However, given that "join" on indices/keys is proposed as a generalized form of broadcasting #25904 (and special handling for singleton axes), it is not clear how taking "product" for broadcasting with Sets can fit in a bigger picture.

So, as a short-term solution, I propose to forbid broadcasting with Sets which can easily be done by defining broadcastable(::AbstractSet), as done with Dicts:

broadcastable(::Union{AbstractDict, NamedTuple}) = throw(ArgumentError("broadcasting over dictionaries and `NamedTuple`s is reserved"))

This, of course, requires core devs to decide that this can be treated as a "minor change."

As a longer-term solution, does it make sense to use set arithmetic (which, I suppose, implementable using the style mechanism)? Is there a guiding principle in which it is natural to have "product" for Sets and "join" for associative data structures (including Arrays)?

@dlfivefifty
Copy link
Contributor

I’m in favour of adding a SetStyle for broadcasting sets as set definition, for the reason it’s useful.

As a counterpoint, we sort of already have this with adjoints in the sense if x and y are vectors then

z in f.(x,y’)

if and only if z = f(x[k],y[j]).

@tkf
Copy link
Member Author

tkf commented Sep 16, 2019

As a counterpoint, we sort of already have this with adjoints

I would still treat this a join combined with a special treatment (= broadcasting) of a singleton axis. I added "(and special handling for singleton axes)" in the OP to clarify my point.

@tkf
Copy link
Member Author

tkf commented Sep 16, 2019

If we were to use set arithmetic for broadcasting with sets, I'd argue we should forbid mixing it with arrays. It may be tempting to "promote" arrays or any iterators as sets. However, it would produce different results depending on how you associate operators. For example, in

x :: Set
y :: Vector
z :: Vector

a = x .+ y .+ z  # equivalent to (x .+ y) .+ z
b = x .+ identity(y .+ z)

a and b would have different results. This is very odd given that + is associative.

@fredrikekre fredrikekre added broadcast Applying a function over a collection collections Data structures holding multiple items, e.g. sets labels Sep 17, 2019
@dlfivefifty
Copy link
Contributor

I would propose the following:

  1. f.(x::Set) should return a Set, so the following is wrong:
julia> x = Set([1,2,3])
Set([2, 3, 1])

julia> exp.(x) # Should return `Set([exp(2),exp(3),exp(1)])`
3-element Array{Float64,1}:
  7.38905609893065 
 20.085536923187668
  2.718281828459045
  1. f.(x::Set, 2) should also return a Set
  2. f.(x::Set, y::Set), f.(x::Set, y::Vector) etc. should probably just error out

So the rough implementation would be:

struct SetStyle <: BroadcastStyle end
BroadcastStyle(::Type{<:AbstractSet}) = SetStyle()
BroadcastStyle(::SetStyle, ::DefaultArrayStyle{0}) = SetStyle()
BroadcastStyle(::SetStyle, ::SetStyle) =error("Cannot broadcast sets together")
BroadcastStyle(::SetStyle, ::AbstractArrayStyle) = error("Cannot broadcast sets and arrays")

copy(b::Broadcast{SetStyle}) = Set(broadcast(b.f, collect.(b.args)))

@dlfivefifty
Copy link
Contributor

If we really want to argue that f.(x::Set, y::Set) should be defined, we could say that a set behaves like a singleton/scalar and by the definition of broadcast

Singleton and missing dimensions
are expanded to match the extents of the other arguments by virtually
repeating the value

which would essentially be equivalent to set functions. But I think that may be pushing things....

@nalimilan
Copy link
Member

See also previous attempt at making this an error at #28491. Not sure why it wasn't merged at the time, since map was deprecated (#26980).

Cc: @mbauman

@tkf
Copy link
Member Author

tkf commented Sep 17, 2019

@nalimilan Thank you. Reviving #28491 seems to be the easiest short-term option.

@dlfivefifty
Copy link
Contributor

The decision to deprecate map seems odd to me, the discussion in #26359 didn't really clarify the problem.

Based on those issues and PRs however the current behaviour is desired. Unfortunately its not very compatible with uncountable sets a la IntervalSets.jl , and its not clear what the analogue of map or broadcast should be....

@tkf
Copy link
Member Author

tkf commented Sep 17, 2019

we could say that a set behaves like a singleton/scalar and by the definition of broadcast

@dlfivefifty I'm thinking that we don't have to follow the current definition strictly if there is a way to generalize the definition to have useful broadcasting with sets. Taking product is certainly useful and probably OK provided that mixing sets with arrays and dicts is an error but I'd be interested in what others think about this.

@dlfivefifty
Copy link
Contributor

I suppose the issue is that there is a desire for broadcast(f, a, ...) to return the same thing as broadcast(f, collect(a), ...). With set products that wouldn't be true. But for other cases I would say it is true, for a certain point of view: e.g., x in broadcast(f, Set([1,2,3]), ...) if and only if x in broadcast(f, [1,2,3], ...) so the two "sets" are the same.

@vtjnash
Copy link
Member

vtjnash commented Sep 17, 2019

iteration order of Set which is not meaningful

It's not defined, but it is meaningful, and so it's not "broken". For example, here's a fairly useless one-liner for doing random things to the values in a dictionary by iterating 3 sets:

julia> D = Dict(1 => 2, 3 => 4, 5 => 6);
julia> Dict(keys(D) .=> tuple.(values(D), values(D) .^ 2))
Dict{Int64,Tuple{Int64,Int64}} with 3 entries:
  3 => (4, 16)
  5 => (6, 36)
  1 => (2, 4)

This also doesn't consider OrderedSet, where the order is defined.

not sure why it wasn't merged at the time, since map was deprecated

map was deprecated because it was doing a special case to make it return a Set which gave surprising results due to the failure to preserve order and count (#26359) and meant associativity was broken for Set. broadcast has currently generally fallen back to iteration (c.f. decision on Ref usage vs. scalarization. Emphasis on currently, since there are open issues proposing to use indexing more heavily.)

f.(x::Set) should return a Set

This is what map used to do, which is why that method was deprecated and not the other.

@tkf
Copy link
Member Author

tkf commented Sep 17, 2019

@vtjnash I think your observation is that current broadcasting definition is useful when the order has some meaning as in the case of KeySet? My main concern is when there is no meaning in the iteration order, as in Set.

@dlfivefifty
Copy link
Contributor

dlfivefifty commented Sep 18, 2019

I agree with @tkf. Broadcasting should only work between ordered collections (including KeySet), where a Set is unordered.

More broadly, it might be good to sort out what AbstractSet means. It seems that people are taking sets to be collections which store only one entry, but I would disagree and think the defining feature is implementing in. So I'd make the following definitions:

  1. A set is any type that implements in. This includes Set, Interval, and AbstractArray
  2. An enumerable set is any type with a countable number of entries which can be iterated over, but where the order may not be defined. This includes Count, Set, and AbstractArray (including infinite arrays) but not Interval.
  3. An ordered set is any enumerable set where the ordering is defined. This does not include Set but includes KeySet, AbstractArray, and Count.
  4. An indexible set is an ordered set which implements getindex. This includes AbstractArray, Broadcasted, but not KeySet.
  5. A shaped set is an indexible set with axes. This includes AbstractArray and probably Broadcasted.

So broadcasting is well-defined for shaped sets. If we assume unshaped ordered sets default to "vector-like", then we can extend the definition for any ordered sets. This leaves sets which are not ordered (Interval and Set). This should probably be an error.

Edit: forgot a key one:

\6. A uniquely enumerable set is an enumerable set where each element appears exactly once. This includes Set, KeySet, Count, and Range, but not Array.

@mbauman mbauman changed the title Broadcasting with Set is broken; how do we fix it? Broadcasting with Set returns an ordered array Sep 18, 2019
@mbauman mbauman changed the title Broadcasting with Set returns an ordered array Broadcasting with Set returns an arbitrarily-ordered array Sep 18, 2019
@tkf
Copy link
Member Author

tkf commented Sep 19, 2019

I think definition 6 is closer to sets in math although they doesn't have to be enumerable. Definition 1 seems to be closer to what is called multiset or bag.

Definition 4 seems to be equivalent to what it's called associative data structure. Arraylike suggested in #32940 seems to cover definition 5.

But I guess you are trying to formalize the capabilities of containers? By maybe using traits? For example does adding Uncountable <: IteratorSize help formalizing definition 2? Adding traits for ordering and indexability seems useful too. The end result would be to error out in broadcastable if IteratorSize does not return HasLength or HasShape. (However, using infinite size iterator like Iterators.take((@lazy 2 .^ iter_all_primes() .- 1), 10) can be useful if we have non-materializing broadcast #19198)

This leaves sets which are not ordered (Interval and Set). This should probably be an error.

I think I agree. But as a brainstorming, I think one possibility is to use:

But this probably is "too dynamic" for people reading the code. I wish there was a generic customizable lifting syntax... (see also Status of question mark syntax for missing values - Development - JuliaLang)

@dlfivefifty
Copy link
Contributor

I think definition 6 is closer to sets in math

This is definitely not true (see Wikipedia). Definition 1 is the closest, in that a set only has the operation in. All other set operations (union, intersection) are derived from the in operation. For example, we can easily derive a lazy union/intersection type that works for any types defined by Definition 1:

struct LazyUnion
  a
  b
end

in(x, d::LazyUnion) = x in d.a || x in d.b

Therefore LazyUnion satisfies Definition 1 provided a and b satisfy Definitino 1.

(The axiom of choice comes into play whether there is also a "choice" function but that's off topic.)

product-based broadcasting if all collections (except numbers) are not indexable ...But this probably is "too dynamic" for people reading the code

Agreed, though I would argue that its well-defined and consistent provided there is only one one set. Actually, I'd be happy if base just throws a "method not found" and then there could be another package SetBroadcasting.jl that implements a SetStyle for those who want set-like broadcasting.

@tkf
Copy link
Member Author

tkf commented Sep 19, 2019

Hmm... it still sounds problematic to me because, if an object has more properties than set, it can distinguish itself from the other by using such properties (ordering, multiplicity, etc.). For example, if you treat an Array as a set, using Base.== would be problematic because it does not follow the axiom of extensionality; i.e., arrays can use shape, ordering, and multiplicity to distinguish one from the other. I suppose you can say that the equality defined in terms of in should be used when treating arrays as sets. But this is equivalent to saying that arrays can be converted to sets which is not a very specific property and I think reduces the usefulness of the notion of set.

@tkf
Copy link
Member Author

tkf commented Sep 19, 2019

Actually, I'd be happy if base just throws a "method not found" and then there could be another package SetBroadcasting.jl that implements a SetStyle for those who want set-like broadcasting.

I agree that this is nice to have. But I prefer to do this without type-piracy, for example, using syntax like

setstyle.(set1 .+ set2)

where setstyle is a "magic" function that sets the broadcasting style.

Note that this requires us to use an approach different from #28491 which uses broadcastable(::AbstractSet) = throw(...). We need to add extra code to allow users to construct Broadcasted object and throw an error at (say) copyto! stage. This approach may be useful for #25904 as well (people can experiment with different implementation outside Base).

@cossio
Copy link
Contributor

cossio commented Dec 6, 2021

Bump. I would like to rekindle the discussion here. I recently had a bug related to this issue.
I think we need a better solution here.

@Moelf
Copy link
Contributor

Moelf commented Dec 6, 2021

So, as a short-term solution, I propose to forbid broadcasting with Sets which can easily be done by defining broadcastable(::AbstractSet), as done with Dicts:

is there objection to this? I don't think the current behavior is reliable so hopefully no code relies on the behavior (need PkgEval in any case)

@cossio
Copy link
Contributor

cossio commented Dec 6, 2021

I'd be fine with things like Set([1,2]) .+ Set([1,2]) and [1,2] .+ Set([1,2]) being errors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
broadcast Applying a function over a collection collections Data structures holding multiple items, e.g. sets
Projects
None yet
Development

No branches or pull requests

7 participants