Skip to content

Commit

Permalink
Expand documentation, add discussion on counterintuitive behavior (#188)
Browse files Browse the repository at this point in the history
* Standardizing markdown sections in README

converting ### to ##

* splitting README material into documenter sections

* add StaticArray example

* fix typo

* mentioning on-the-fly construction of StructArray entries in overview.md

* discussing mutability for counterintuitive behaviors

* adding counterintuitive behavior docs to make.jl

* adding an extra initialization section

* setting "Overview" as the default doc homepage

moving index.md to reference.md, moving overview.md to index.md (and deleting overview.md)

* Apply suggestions from code review

Co-authored-by: Pietro Vertechi <[email protected]>

* removing make.jl TODOs

* Update docs/src/counterintuitive.md

Co-authored-by: Pietro Vertechi <[email protected]>

Co-authored-by: Jesse Chan <[email protected]>
Co-authored-by: Pietro Vertechi <[email protected]>
  • Loading branch information
3 people authored Aug 27, 2021
1 parent e0b70ac commit 0a0032c
Show file tree
Hide file tree
Showing 7 changed files with 524 additions and 43 deletions.
6 changes: 3 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ julia> StructArray([1+im, 3-2im])
3 - 2im
```

### Collection and initialization
## Collection and initialization

One can also create a `StructArray` from an iterable of structs without creating an intermediate `Array`:

Expand Down Expand Up @@ -76,7 +76,7 @@ julia> rand!(s)
0.92407+0.929336im 0.267358+0.804478im
```

### Using custom array types
## Using custom array types

StructArrays supports using custom array types. It is always possible to pass field arrays of a custom type. The "custom array of structs to struct of custom arrays" transformation will use the `similar` method of the custom array type. This can be useful when working on the GPU for example:

Expand Down Expand Up @@ -153,7 +153,7 @@ julia> push!(t, (a = 3, b = "z"))
(a = 3, b = "z")
```

### Lazy row iteration
## Lazy row iteration

StructArrays also provides a `LazyRow` wrapper for lazy row iteration. `LazyRow(t, i)` does not materialize the i-th row but returns a lazy wrapper around it on which `getproperty` does the correct thing. This is useful when the row has many fields only some of which are necessary. It also allows changing columns in place.

Expand Down
9 changes: 8 additions & 1 deletion docs/make.jl
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,14 @@ using StructArrays
makedocs(
sitename = "StructArrays",
format = Documenter.HTML(prettyurls = get(ENV, "CI", nothing) == "true"),
modules = [StructArrays]
modules = [StructArrays],
pages = [
"Overview"=>"index.md",
"Example usage"=>"examples.md",
"Some counterintuitive behaviors"=>"counterintuitive.md",
"Advanced techniques"=>"advanced.md",
"Index"=>"reference.md",
]
)

# Documenter can also automatically deploy documentation to gh-pages.
Expand Down
136 changes: 136 additions & 0 deletions docs/src/advanced.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,136 @@
# Advanced techniques

## Structures with non-standard data layout

StructArrays support structures with custom data layout. The user is required to overload `staticschema` in order to define the custom layout, `component` to access fields of the custom layout, and `createinstance(T, fields...)` to create an instance of type `T` from its custom fields `fields`. In other word, given `x::T`, `createinstance(T, (component(x, f) for f in fieldnames(staticschema(T)))...)` should successfully return an instance of type `T`.

Here is an example of a type `MyType` that has as custom fields either its field `data` or fields of its field `rest` (which is a named tuple):

```julia
using StructArrays

struct MyType{T, NT<:NamedTuple}
data::T
rest::NT
end

MyType(x; kwargs...) = MyType(x, values(kwargs))

function StructArrays.staticschema(::Type{MyType{T, NamedTuple{names, types}}}) where {T, names, types}
return NamedTuple{(:data, names...), Base.tuple_type_cons(T, types)}
end

function StructArrays.component(m::MyType, key::Symbol)
return key === :data ? getfield(m, 1) : getfield(getfield(m, 2), key)
end

# generate an instance of MyType type
function StructArrays.createinstance(::Type{MyType{T, NT}}, x, args...) where {T, NT}
return MyType(x, NT(args))
end

s = [MyType(rand(), a=1, b=2) for i in 1:10]
StructArray(s)
```

In the above example, our `MyType` was composed of `data` of type `Float64` and `rest` of type `NamedTuple`. In many practical cases where there are custom types involved it's hard for StructArrays to automatically widen the types in case they are heterogeneous. The following example demonstrates a widening method in that scenario.

```julia
using Tables

# add a source of custom type data
struct Location{U}
x::U
y::U
end
struct Region{V}
area::V
end

s1 = MyType(Location(1, 0), place = "Delhi", rainfall = 200)
s2 = MyType(Location(2.5, 1.9), place = "Mumbai", rainfall = 1010)
s3 = MyType(Region([Location(1, 0), Location(2.5, 1.9)]), place = "North India", rainfall = missing)

s = [s1, s2, s3]
# Now if we try to do StructArray(s)
# we will get an error

function meta_table(iter)
cols = Tables.columntable(iter)
meta_table(first(cols), Base.tail(cols))
end

function meta_table(data, rest::NT) where NT<:NamedTuple
F = MyType{eltype(data), StructArrays.eltypes(NT)}
return StructArray{F}(; data=data, rest...)
end

meta_table(s)
```

The above strategy has been tested and implemented in [GeometryBasics.jl](https://github.com/JuliaGeometry/GeometryBasics.jl).

## Mutate-or-widen style accumulation

StructArrays provides a function `StructArrays.append!!(dest, src)` (unexported) for "mutate-or-widen" style accumulation. This function can be used via [`BangBang.append!!`](https://juliafolds.github.io/BangBang.jl/dev/#BangBang.append!!) and [`BangBang.push!!`](https://juliafolds.github.io/BangBang.jl/dev/#BangBang.push!!) as well.

`StructArrays.append!!` works like `append!(dest, src)` if `dest` can contain all element types in `src` iterator; i.e., it _mutates_ `dest` in-place:

```julia
julia> dest = StructVector((a=[1], b=[2]))
1-element StructArray(::Array{Int64,1}, ::Array{Int64,1}) with eltype NamedTuple{(:a, :b),Tuple{Int64,Int64}}:
(a = 1, b = 2)

julia> StructArrays.append!!(dest, [(a = 3, b = 4)])
2-element StructArray(::Array{Int64,1}, ::Array{Int64,1}) with eltype NamedTuple{(:a, :b),Tuple{Int64,Int64}}:
(a = 1, b = 2)
(a = 3, b = 4)

julia> ans === dest
true
```

Unlike `append!`, `append!!` can also _widen_ element type of `dest` array:

```julia
julia> StructArrays.append!!(dest, [(a = missing, b = 6)])
3-element StructArray(::Array{Union{Missing, Int64},1}, ::Array{Int64,1}) with eltype NamedTuple{(:a, :b),Tuple{Union{Missing, Int64},Int64}}:
NamedTuple{(:a, :b),Tuple{Union{Missing, Int64},Int64}}((1, 2))
NamedTuple{(:a, :b),Tuple{Union{Missing, Int64},Int64}}((3, 4))
NamedTuple{(:a, :b),Tuple{Union{Missing, Int64},Int64}}((missing, 6))

julia> ans === dest
false
```

Since the original array `dest` cannot hold the input, a new array is created (`ans !== dest`).

Combined with [function barriers](https://docs.julialang.org/en/latest/manual/performance-tips/#kernel-functions-1), `append!!` is a useful building block for implementing `collect`-like functions.

## Using StructArrays in CUDA kernels

It is possible to combine StructArrays with [CUDAnative](https://github.com/JuliaGPU/CUDAnative.jl), in order to create CUDA kernels that work on StructArrays directly on the GPU. Make sure you are familiar with the CUDAnative documentation (esp. kernels with plain `CuArray`s) before experimenting with kernels based on `StructArray`s.

```julia
using CUDAnative, CuArrays, StructArrays
d = StructArray(a = rand(100), b = rand(100))

# move to GPU
dd = replace_storage(CuArray, d)
de = similar(dd)

# a simple kernel, to copy the content of `dd` onto `de`
function kernel!(dest, src)
i = (blockIdx().x-1)*blockDim().x + threadIdx().x
if i <= length(dest)
dest[i] = src[i]
end
return nothing
end

threads = 1024
blocks = cld(length(dd),threads)

@cuda threads=threads blocks=blocks kernel!(de, dd)
```

70 changes: 70 additions & 0 deletions docs/src/counterintuitive.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
# Some counterintuitive behaviors

StructArrays doesn't explicitly store any structs; rather, it materializes a struct element on the fly when `getindex` is called. This is typically very efficient; for example, if all the struct fields are `isbits`, then materializing a new struct does not allocate. However, this can lead to counterintuitive behavior when modifying entries of a StructArray.

## Modifying the field of a struct element

```julia
julia> mutable struct Foo{T}
a::T
b::T
end

julia> x = StructArray([Foo(1,2) for i = 1:5])

julia> x[1].a = 10

julia> x # remains unchanged
5-element StructArray(::Vector{Int64}, ::Vector{Int64}) with eltype Foo{Int64}:
Foo{Int64}(1, 2)
Foo{Int64}(1, 2)
Foo{Int64}(1, 2)
Foo{Int64}(1, 2)
Foo{Int64}(1, 2)
```
The assignment `x[1].a = 10` first calls `getindex(x,1)`, then sets property `a` of the accessed element. However, since StructArrays constructs `Foo(x.a[1],x.b[1])` on the fly when when accessing `x[1]`, setting `x[1].a = 10` modifies the materialized struct rather than the StructArray `x`.

Note that one can modify a field of a StructArray entry via `x.a[1] = 10` (the order of `getproperty` and `getindex` matters). As an added benefit, this does not require that the struct `Foo` is mutable, as it modifies the underlying component array `x.a` directly.

For mutable structs, it is possible to write code that works for both regular `Array`s and `StructArray`s with the following trick:
```julia
x[1] = x[1].a = 10
```

`x[1].a = 10` creates a new `Foo` element, modifies the field `a`, then returns the modified struct. Assigning this to `x[1]` then unpacks `a` and `b` from the modified struct and assigns entries of the component arrays `x.a[1] = a`, `x.b[1] = b`.

## Broadcasted assignment for array entries

Broadcasted in-place assignment can also behave counterintuitively for StructArrays.
```julia
julia> mutable struct Bar{T} <: FieldVector{2,T}
a::T
b::T
end

julia> x = StructArray([Bar(1,2) for i = 1:5])
5-element StructArray(::Vector{Int64}, ::Vector{Int64}) with eltype Bar{Int64}:
[1, 2]
[1, 2]
[1, 2]
[1, 2]
[1, 2]

julia> x[1] .= 1
2-element Bar{Int64} with indices SOneTo(2):
1
1

julia> x
5-element StructArray(::Vector{Int64}, ::Vector{Int64}) with eltype Bar{Int64}:
[1, 2]
[1, 2]
[1, 2]
[1, 2]
[1, 2]
```
Because setting `x[1] .= 1` creates a `Bar` struct first, broadcasted assignment modifies this new materialized struct rather than the StructArray `x`. Note, however, that `x[1] = x[1] .= 1` works, since it assigns the modified materialized struct to the first entry of `x`.

## Mutable struct types

Each of these counterintuitive behaviors occur when using StructArrays with mutable elements. However, since the component arrays of a StructArray are generally mutable even if its entries are immutable, a StructArray with immutable elements will in many cases behave identically to (but be more efficient than) a StructArray with mutable elements. Thus, it is recommended to use immutable structs with StructArray whenever possible.
87 changes: 87 additions & 0 deletions docs/src/examples.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
## Example usage to store complex numbers

```julia
julia> using StructArrays, Random

julia> Random.seed!(4);

julia> s = StructArray{ComplexF64}((rand(2,2), rand(2,2)))
2×2 StructArray(::Array{Float64,2}, ::Array{Float64,2}) with eltype Complex{Float64}:
0.680079+0.625239im 0.92407+0.267358im
0.874437+0.737254im 0.929336+0.804478im

julia> s[1, 1]
0.680079235935741 + 0.6252391193298537im

julia> s.re
2×2 Array{Float64,2}:
0.680079 0.92407
0.874437 0.929336

julia> StructArrays.components(s) # obtain all field arrays as a named tuple
(re = [0.680079 0.92407; 0.874437 0.929336], im = [0.625239 0.267358; 0.737254 0.804478])
```

Note that the same approach can be used directly from an `Array` of complex numbers:

```julia
julia> StructArray([1+im, 3-2im])
2-element StructArray(::Array{Int64,1}, ::Array{Int64,1}) with eltype Complex{Int64}:
1 + 1im
3 - 2im
```

## Example usage to store a data table

```julia
julia> t = StructArray((a = [1, 2], b = ["x", "y"]))
2-element StructArray(::Array{Int64,1}, ::Array{String,1}) with eltype NamedTuple{(:a, :b),Tuple{Int64,String}}:
(a = 1, b = "x")
(a = 2, b = "y")

julia> t[1]
(a = 1, b = "x")

julia> t.a
2-element Array{Int64,1}:
1
2

julia> push!(t, (a = 3, b = "z"))
3-element StructArray(::Array{Int64,1}, ::Array{String,1}) with eltype NamedTuple{(:a, :b),Tuple{Int64,String}}:
(a = 1, b = "x")
(a = 2, b = "y")
(a = 3, b = "z")
```

## Example usage with StaticArray elements

```julia
julia> using StructArrays, StaticArrays

julia> x = StructArray([SVector{2}(1,2) for i = 1:5])
5-element StructArray(::Vector{Tuple{Int64, Int64}}) with eltype SVector{2, Int64}:
[1, 2]
[1, 2]
[1, 2]
[1, 2]
[1, 2]

julia> A = StructArray([SMatrix{2,2}([1 2;3 4]) for i = 1:5])
5-element StructArray(::Vector{NTuple{4, Int64}}) with eltype SMatrix{2, 2, Int64, 4}:
[1 2; 3 4]
[1 2; 3 4]
[1 2; 3 4]
[1 2; 3 4]
[1 2; 3 4]

julia> B = StructArray([SArray{Tuple{2,2,2}}(reshape(1:8,2,2,2)) for i = 1:5]); B[1]
2×2×2 SArray{Tuple{2, 2, 2}, Int64, 3, 8} with indices SOneTo(2)×SOneTo(2)×SOneTo(2):
[:, :, 1] =
1 3
2 4

[:, :, 2] =
5 7
6 8
```
Loading

0 comments on commit 0a0032c

Please sign in to comment.