Skip to content

Commit

Permalink
Update documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
danlooo committed May 22, 2024
1 parent b28c558 commit a58dcac
Show file tree
Hide file tree
Showing 16 changed files with 525 additions and 427 deletions.
1 change: 1 addition & 0 deletions docs/Project.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
[deps]
BenchmarkTools = "6e4b80f9-dd63-53aa-95a3-0cdb28fa8baf"
Bonito = "824d6782-a2ef-11e9-3a09-e5662e0c26f8"
CFTime = "179af706-886a-5703-950a-314cd64e0468"
CairoMakie = "13f3f980-e62b-5c42-98c6-ff1f3baf88f0"
DataFrames = "a93c6f00-e57d-5684-b7b6-d8193f3e46c0"
Dates = "ade2ca70-3891-5945-98fb-dc099432e06a"
Expand Down
14 changes: 11 additions & 3 deletions docs/src/.vitepress/config.mts
Original file line number Diff line number Diff line change
Expand Up @@ -35,13 +35,19 @@ export default defineConfig({
},
nav: [
{ text: 'Home', link: '/' },
{ text: 'Getting Started', link: '/getting_started' },
{ text: 'Get Started', link: '/get_started' },
{
text: 'User Guide',
items: [
{ text: 'Read and Write', link: '/UserGuide/read_and_write' },
{ text: 'Read', link: '/UserGuide/read' },
{ text: 'Create', link: '/UserGuide/create' },
{ text: 'Write', link: '/UserGuide/write' },
{ text: 'Subset', link: '/UserGuide/subset' },
{ text: 'Compute', link: '/UserGuide/compute' },
{ text: 'FAQ', link: '/UserGuide/faq' },
{ text: 'Group', link: '/UserGuide/group' },
{ text: 'Combine', link: '/UserGuide/combine' },
{ text: 'Chunk', link: '/UserGuide/chunk' },
{ text: 'FAQ', link: '/UserGuide/faq' }
]
},
{
Expand Down Expand Up @@ -78,11 +84,13 @@ export default defineConfig({
text: 'User Guide',
items: [
{ text: 'Types', link: '/UserGuide/types' },
{ text: 'Create', link: '/UserGuide/create' },
{ text: 'Read', link: '/UserGuide/read' },
{ text: 'Write', link: '/UserGuide/write' },
{ text: 'Subset', link: '/UserGuide/subset' },
{ text: 'Compute', link: '/UserGuide/compute' },
{ text: 'Group', link: '/UserGuide/group' },
{ text: 'Combine', link: '/UserGuide/combine' },
{ text: 'Chunk', link: '/UserGuide/chunk' },
{ text: 'FAQ', link: '/UserGuide/faq' }
]
Expand Down
91 changes: 90 additions & 1 deletion docs/src/UserGuide/chunk.md
Original file line number Diff line number Diff line change
@@ -1 +1,90 @@
# Chunk YAXArrays
# Chunk YAXArrays

> [!IMPORTANT]
> Thinking about chunking is important when it comes to analyzing your data, because in most situations this will not fit into memory, hence having the fastest read access to it is crucial for your workflows. For example, for geo-spatial data do you want fast access on time or space, or... think about it.
To determine the chunk size of the array representation on disk,
call the `setchunks` function prior to saving.

## Chunking YAXArrays

````@example chunks
using YAXArrays, Zarr
a = YAXArray(rand(10,20))
a_chunked = setchunks(a, (5,10))
a_chunked.chunks
````
And the saved file is also splitted into Chunks.

````@example chunks
f = tempname()
savecube(a_chunked, f, backend=:zarr)
Cube(f).chunks
````

Alternatively chunk sizes can be given by dimension name, so the following results in the same chunks:

````@example chunks
a_chunked = setchunks(a, (Dim_2=10, Dim_1=5))
a_chunked.chunks
````

## Chunking Datasets
Setchunks can also be applied to a `Dataset`.

### Set Chunks by Axis

Set chunk size for each axis occuring in a `Dataset`. This will be applied to all variables in the dataset:

````@example chunks
using YAXArrays, Zarr
ds = Dataset(x = YAXArray(rand(10,20)), y = YAXArray(rand(10)), z = YAXArray(rand(10,20,5)))
dschunked = setchunks(ds, Dict("Dim_1"=>5, "Dim_2"=>10, "Dim_3"=>2))
Cube(dschunked).chunks
````

Saving...

````@example chunks
f = tempname()
savedataset(dschunked, path=f, driver=:zarr)
````

### Set chunking by Variable

The following will set the chunk size for each Variable separately
and results in exactly the same chunking as the example above

````@example chunks
using YAXArrays, Zarr
ds = Dataset(x = YAXArray(rand(10,20)), y = YAXArray(rand(10)), z = YAXArray(rand(10,20,5)))
dschunked = setchunks(ds,(x = (5,10), y = Dict("Dim_1"=>5), z = (Dim_1 = 5, Dim_2 = 10, Dim_3 = 2)))
Cube(dschunked).chunks
````

saving...

````@example chunks
f = tempname()
savedataset(dschunked, path=f, driver=:zarr)
````

### Set chunking for all variables

The following code snippet only works when all member variables of the dataset have the same shape and sets the output chunks for all arrays.

````@example chunks
using YAXArrays, Zarr
ds = Dataset(x = YAXArray(rand(10,20)), y = YAXArray(rand(10,20)), z = YAXArray(rand(10,20)))
dschunked = setchunks(ds,(5,10))
Cube(dschunked).chunks
````

saving...

````@example chunks
f = tempname()
savedataset(dschunked, path=f, driver=:zarr)
````

Suggestions on how to improve or add to these examples is welcome.
33 changes: 33 additions & 0 deletions docs/src/UserGuide/combine.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
# Combine YAXArrays

Data is often scattered across multiple files and corresponding arrays, e.g. one file per time step.
This section describes methods on how to combine them into a single YAXArray.

## Concatenate YAXArrays along an existing dimension

Here we use `cat` to combine two arrays consisting of data from the first and the second half of a year into one single array containing the whole year.
We glue the arrays along the first dimension using `dims = 1`:
The resulting array `whole_year` still has one dimension, i.e. time, but with 12 instead of 6 elements.

````@example cat
using YAXArrays
first_half = YAXArray((Dim{:time}(1:6),), rand(6))
second_half = YAXArray((Dim{:time}(7:12),), rand(6))
whole_year = cat(first_half, second_half, dims = 1)
````

## Combine YAXArrays along a new dimension

Here we use `concatenatecubes` to combine two arrays of different variables that share the same time dimension.
The resulting array `combined` has an additional dimension `variable` indicating from which array the element values originates.

````@example concatenatecubes
using YAXArrays
temperature = YAXArray((Dim{:time}(1:6),), rand(6))
precipitation = YAXArray((Dim{:time}(1:6),), rand(6))
cubes = [temperature,precipitation]
var_axis = Dim{:variable}(["temp", "prep"])
combined = concatenatecubes(cubes, var_axis)
````
88 changes: 87 additions & 1 deletion docs/src/UserGuide/compute.md
Original file line number Diff line number Diff line change
@@ -1,16 +1,102 @@
# Compute YAXArrays

This section describes how to create new YAXArrays by performing arithmetic operations.
This section describes how to create new YAXArrays by performing operations on them.

- Use [arithmetics](#Arithmetics) to add or multiply numbers to each element of an array
- Use [map](#map) to apply a more complex functions to every element of an array
- Use [mapslices](#mapslices) to reduce a dimension, e.g. to get the mean over all time steps
- Use [mapCube](#mapCube) to apply complex functions on an array that may change any dimensions


Let's start by creating an example dataset:

````@example compute
using YAXArrays
using Dates
axlist = (
Dim{:time}(Date("2022-01-01"):Day(1):Date("2022-01-30")),
Dim{:lon}(range(1, 10, length=10)),
Dim{:lat}(range(1, 5, length=15)),
)
data = rand(30, 10, 15)
properties = Dict(:origin => "user guide")
a = YAXArray(axlist, data2, properties)
````

## Modify elements of a YAXArray

````@example compute
a[1,2,3]
````

````@example compute
a[1,2,3] = 42
````

````@example compute
a[1,2,3]
````

::: warning

Some arrays, e.g. those saved in a cloud object storage are immutable making any modification of the data impossible.

:::


## Arithmetics

Add a value to all elements of an array and save it as a new array:

````@example compute
a2 = a .+ 5
````

````@example compute
a2[1,2,3] == a[1,2,3] + 5
````

## map

Apply a function on every element of an array individually:

````@example compute
offset = 5
map(a) do x
(x + offset) / 2 * 3
end
````

This keeps all dimensions unchanged.
Note, that here we can not access neighboring elements.
In this case, we can use `mapslices` or `mapCube` instead.
Each element of the array is processed individually.

The code runs very fast, because `map` applies the function lazily.
Actual computation will be performed only on demand, e.g. when elements were explicitly requested or further computations were performed.


## mapslices

Reduce the time dimension by calculating the average value of all time points:

````@example compute
import Statistics: mean
mapslices(mean, a, dims="Time")
````
There is no time dimension left, because there is only one value left after averaging all time steps.
We can also calculate spatial means resulting in one value per time step:

````@example compute
import Statistics: mean
mapslices(mean, a, dims=("lat", "lon"))
````

## mapCube



## Distributed Computation

parallel
35 changes: 35 additions & 0 deletions docs/src/UserGuide/create.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
# Create YAXArrays and Datasets

## Create a YAXArray

We can create a new YAXArray by filling the values directly:

````@example create
using YAXArrays
a1 = YAXArray(rand(10, 20, 5))
````

We can also specify the dimensions with custom names enabling easier access:

````@example create
using Dates
axlist = (
Dim{:time}(Date("2022-01-01"):Day(1):Date("2022-01-30")),
Dim{:lon}(range(1, 10, length=10)),
Dim{:lat}(range(1, 5, length=15)),
)
data2 = rand(30, 10, 15)
properties = Dict(:origin => "user guide")
a2 = YAXArray(axlist, data2, properties)
````

## Create a Dataset

````@example create
data3 = rand(30, 10, 15)
a3 = YAXArray(axlist, data3, properties)
arrays = Dict(:a2 => a2, :a3 => a3)
ds = Dataset(; properties, arrays...)
````
75 changes: 0 additions & 75 deletions docs/src/UserGuide/etc/applyfunctions.md

This file was deleted.

Loading

0 comments on commit a58dcac

Please sign in to comment.