-
Notifications
You must be signed in to change notification settings - Fork 19
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
16 changed files
with
525 additions
and
427 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1 +1,90 @@ | ||
# Chunk YAXArrays | ||
# Chunk YAXArrays | ||
|
||
> [!IMPORTANT] | ||
> Thinking about chunking is important when it comes to analyzing your data, because in most situations this will not fit into memory, hence having the fastest read access to it is crucial for your workflows. For example, for geo-spatial data do you want fast access on time or space, or... think about it. | ||
To determine the chunk size of the array representation on disk, | ||
call the `setchunks` function prior to saving. | ||
|
||
## Chunking YAXArrays | ||
|
||
````@example chunks | ||
using YAXArrays, Zarr | ||
a = YAXArray(rand(10,20)) | ||
a_chunked = setchunks(a, (5,10)) | ||
a_chunked.chunks | ||
```` | ||
And the saved file is also splitted into Chunks. | ||
|
||
````@example chunks | ||
f = tempname() | ||
savecube(a_chunked, f, backend=:zarr) | ||
Cube(f).chunks | ||
```` | ||
|
||
Alternatively chunk sizes can be given by dimension name, so the following results in the same chunks: | ||
|
||
````@example chunks | ||
a_chunked = setchunks(a, (Dim_2=10, Dim_1=5)) | ||
a_chunked.chunks | ||
```` | ||
|
||
## Chunking Datasets | ||
Setchunks can also be applied to a `Dataset`. | ||
|
||
### Set Chunks by Axis | ||
|
||
Set chunk size for each axis occuring in a `Dataset`. This will be applied to all variables in the dataset: | ||
|
||
````@example chunks | ||
using YAXArrays, Zarr | ||
ds = Dataset(x = YAXArray(rand(10,20)), y = YAXArray(rand(10)), z = YAXArray(rand(10,20,5))) | ||
dschunked = setchunks(ds, Dict("Dim_1"=>5, "Dim_2"=>10, "Dim_3"=>2)) | ||
Cube(dschunked).chunks | ||
```` | ||
|
||
Saving... | ||
|
||
````@example chunks | ||
f = tempname() | ||
savedataset(dschunked, path=f, driver=:zarr) | ||
```` | ||
|
||
### Set chunking by Variable | ||
|
||
The following will set the chunk size for each Variable separately | ||
and results in exactly the same chunking as the example above | ||
|
||
````@example chunks | ||
using YAXArrays, Zarr | ||
ds = Dataset(x = YAXArray(rand(10,20)), y = YAXArray(rand(10)), z = YAXArray(rand(10,20,5))) | ||
dschunked = setchunks(ds,(x = (5,10), y = Dict("Dim_1"=>5), z = (Dim_1 = 5, Dim_2 = 10, Dim_3 = 2))) | ||
Cube(dschunked).chunks | ||
```` | ||
|
||
saving... | ||
|
||
````@example chunks | ||
f = tempname() | ||
savedataset(dschunked, path=f, driver=:zarr) | ||
```` | ||
|
||
### Set chunking for all variables | ||
|
||
The following code snippet only works when all member variables of the dataset have the same shape and sets the output chunks for all arrays. | ||
|
||
````@example chunks | ||
using YAXArrays, Zarr | ||
ds = Dataset(x = YAXArray(rand(10,20)), y = YAXArray(rand(10,20)), z = YAXArray(rand(10,20))) | ||
dschunked = setchunks(ds,(5,10)) | ||
Cube(dschunked).chunks | ||
```` | ||
|
||
saving... | ||
|
||
````@example chunks | ||
f = tempname() | ||
savedataset(dschunked, path=f, driver=:zarr) | ||
```` | ||
|
||
Suggestions on how to improve or add to these examples is welcome. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,33 @@ | ||
# Combine YAXArrays | ||
|
||
Data is often scattered across multiple files and corresponding arrays, e.g. one file per time step. | ||
This section describes methods on how to combine them into a single YAXArray. | ||
|
||
## Concatenate YAXArrays along an existing dimension | ||
|
||
Here we use `cat` to combine two arrays consisting of data from the first and the second half of a year into one single array containing the whole year. | ||
We glue the arrays along the first dimension using `dims = 1`: | ||
The resulting array `whole_year` still has one dimension, i.e. time, but with 12 instead of 6 elements. | ||
|
||
````@example cat | ||
using YAXArrays | ||
first_half = YAXArray((Dim{:time}(1:6),), rand(6)) | ||
second_half = YAXArray((Dim{:time}(7:12),), rand(6)) | ||
whole_year = cat(first_half, second_half, dims = 1) | ||
```` | ||
|
||
## Combine YAXArrays along a new dimension | ||
|
||
Here we use `concatenatecubes` to combine two arrays of different variables that share the same time dimension. | ||
The resulting array `combined` has an additional dimension `variable` indicating from which array the element values originates. | ||
|
||
````@example concatenatecubes | ||
using YAXArrays | ||
temperature = YAXArray((Dim{:time}(1:6),), rand(6)) | ||
precipitation = YAXArray((Dim{:time}(1:6),), rand(6)) | ||
cubes = [temperature,precipitation] | ||
var_axis = Dim{:variable}(["temp", "prep"]) | ||
combined = concatenatecubes(cubes, var_axis) | ||
```` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,16 +1,102 @@ | ||
# Compute YAXArrays | ||
|
||
This section describes how to create new YAXArrays by performing arithmetic operations. | ||
This section describes how to create new YAXArrays by performing operations on them. | ||
|
||
- Use [arithmetics](#Arithmetics) to add or multiply numbers to each element of an array | ||
- Use [map](#map) to apply a more complex functions to every element of an array | ||
- Use [mapslices](#mapslices) to reduce a dimension, e.g. to get the mean over all time steps | ||
- Use [mapCube](#mapCube) to apply complex functions on an array that may change any dimensions | ||
|
||
|
||
Let's start by creating an example dataset: | ||
|
||
````@example compute | ||
using YAXArrays | ||
using Dates | ||
axlist = ( | ||
Dim{:time}(Date("2022-01-01"):Day(1):Date("2022-01-30")), | ||
Dim{:lon}(range(1, 10, length=10)), | ||
Dim{:lat}(range(1, 5, length=15)), | ||
) | ||
data = rand(30, 10, 15) | ||
properties = Dict(:origin => "user guide") | ||
a = YAXArray(axlist, data2, properties) | ||
```` | ||
|
||
## Modify elements of a YAXArray | ||
|
||
````@example compute | ||
a[1,2,3] | ||
```` | ||
|
||
````@example compute | ||
a[1,2,3] = 42 | ||
```` | ||
|
||
````@example compute | ||
a[1,2,3] | ||
```` | ||
|
||
::: warning | ||
|
||
Some arrays, e.g. those saved in a cloud object storage are immutable making any modification of the data impossible. | ||
|
||
::: | ||
|
||
|
||
## Arithmetics | ||
|
||
Add a value to all elements of an array and save it as a new array: | ||
|
||
````@example compute | ||
a2 = a .+ 5 | ||
```` | ||
|
||
````@example compute | ||
a2[1,2,3] == a[1,2,3] + 5 | ||
```` | ||
|
||
## map | ||
|
||
Apply a function on every element of an array individually: | ||
|
||
````@example compute | ||
offset = 5 | ||
map(a) do x | ||
(x + offset) / 2 * 3 | ||
end | ||
```` | ||
|
||
This keeps all dimensions unchanged. | ||
Note, that here we can not access neighboring elements. | ||
In this case, we can use `mapslices` or `mapCube` instead. | ||
Each element of the array is processed individually. | ||
|
||
The code runs very fast, because `map` applies the function lazily. | ||
Actual computation will be performed only on demand, e.g. when elements were explicitly requested or further computations were performed. | ||
|
||
|
||
## mapslices | ||
|
||
Reduce the time dimension by calculating the average value of all time points: | ||
|
||
````@example compute | ||
import Statistics: mean | ||
mapslices(mean, a, dims="Time") | ||
```` | ||
There is no time dimension left, because there is only one value left after averaging all time steps. | ||
We can also calculate spatial means resulting in one value per time step: | ||
|
||
````@example compute | ||
import Statistics: mean | ||
mapslices(mean, a, dims=("lat", "lon")) | ||
```` | ||
|
||
## mapCube | ||
|
||
|
||
|
||
## Distributed Computation | ||
|
||
parallel |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,35 @@ | ||
# Create YAXArrays and Datasets | ||
|
||
## Create a YAXArray | ||
|
||
We can create a new YAXArray by filling the values directly: | ||
|
||
````@example create | ||
using YAXArrays | ||
a1 = YAXArray(rand(10, 20, 5)) | ||
```` | ||
|
||
We can also specify the dimensions with custom names enabling easier access: | ||
|
||
````@example create | ||
using Dates | ||
axlist = ( | ||
Dim{:time}(Date("2022-01-01"):Day(1):Date("2022-01-30")), | ||
Dim{:lon}(range(1, 10, length=10)), | ||
Dim{:lat}(range(1, 5, length=15)), | ||
) | ||
data2 = rand(30, 10, 15) | ||
properties = Dict(:origin => "user guide") | ||
a2 = YAXArray(axlist, data2, properties) | ||
```` | ||
|
||
## Create a Dataset | ||
|
||
````@example create | ||
data3 = rand(30, 10, 15) | ||
a3 = YAXArray(axlist, data3, properties) | ||
arrays = Dict(:a2 => a2, :a3 => a3) | ||
ds = Dataset(; properties, arrays...) | ||
```` |
This file was deleted.
Oops, something went wrong.
Oops, something went wrong.