-
Notifications
You must be signed in to change notification settings - Fork 17
/
07_stars.Rmd
1024 lines (724 loc) · 39 KB
/
07_stars.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
# Spatiotemporal Raster Data Handling with `stars` {#stars-basics}
```{r chap_7_setup, include = FALSE}
library(knitr)
knitr::opts_chunk$set(
echo = TRUE,
cache = FALSE,
comment = NA,
message = FALSE,
warning = FALSE,
tidy = FALSE,
cache.lazy = FALSE
)
library(here)
opts_knit$set(root.dir = here())
```
```{r , eval = FALSE, echo = FALSE}
setwd(here())
```
```{r, include = FALSE, warning=FALSE, cache = FALSE}
#--- load packages ---#
library(data.table)
library(stringr)
library(raster)
library(terra)
library(ggplot2)
library(cdlTools)
library(sf)
library(gdalcubes)
library(tidyverse)
library(lubridate)
library(tidyr)
library(viridis)
library(tictoc)
library(stars)
library(tmap)
```
## Before you start {.unnumbered}
In this Chapter, we introduce the `stars` package [@stars] for raster data handling. It can be particularly useful for those who use spatiotemporal raster data often (like daily PRISM and Daymet data) because it brings a framework that provides a consistent treatment of raster data with temporal dimensions. Specifically, `stars` objects can have a time dimension in addition to the spatial 2D dimensions (longitude and latitude), where the time dimension can take `Date` values.[^stars-1] This can be handy for several reasons as you will see below (e.g., filtering the data by date).
[^stars-1]: and other `Date` classes like `POSIXct`
Another advantage of the `stars` package is its compatibility with `sf` objects as the lead developer of the two packages is the same person. Therefore, unlike the `terra` package approach, we do not need any tedious conversions between `sf` and `SpatVector`. The `stars` package also allows `dplyr`-like data operations using functions like `filter()`, `mutate()` (see section \@ref(dplyr-op)).
In Chapters \@ref(raster-basics) and \@ref(int-RV), we used the `raster` and `terra` packages to handle raster data and interact raster data with vector data. If you do not feel any inconvenience with the approach, you do not need to read on. Also, note that the `stars` package was not written to replace either `raster` or `terra` packages. [Here](https://github.com/r-spatial/stars/wiki/How-%60raster%60-functions-map-to-%60stars%60-functions) is a good summary of how `raster` functions map to `stars` functions. As you can see, there are many functions that are available to the `raster` packages that cannot be implemented by the `stars` package. However, I must say the functionality of the `stars` package is rather complete at least for most economists, and it is definitely possible to use just the `stars` package for all the raster data work in most cases.[^stars-2]
[^stars-2]: at least on the surface
Finally, this book does not cover the use of `stars_proxy` for big data that does not fit in your memory, which may be useful for some of you. [This](https://r-spatial.github.io/stars/articles/stars2.html) provides an introduction to `stars_proxy` for those interested. This book also does not cover **irregular** raster cells (e.g., curvelinear grids). Interested readers are referred to [here](https://r-spatial.github.io/stars/articles/stars4.html).
### Direction for replication {.unnumbered}
**Datasets**
All the datasets that you need to import are available [here](https://www.dropbox.com/sh/l8fmabfjjcuzu5u/aad75x8b_f3yvaq6ca5ap5oza?dl=0). In this chapter, the path to files is set relative to my own working directory (which is hidden). To run the codes without having to mess with paths to the files, follow these steps:
- set a folder (any folder) as the working directory using `setwd()`\
- create a folder called "Data" inside the folder designated as the working directory (if you have created a "Data" folder previously, skip this step)
- download the pertinent datasets from [here](https://www.dropbox.com/sh/l8fmabfjjcuzu5u/aad75x8b_f3yvaq6ca5ap5oza?dl=0) and put them in the "Data" folder
**Packages**
- Run the following code to install or load (if already installed) the `pacman` package, and then install or load (if already installed) the listed package inside the `pacman::p_load()` function.
```{r Chap7_packages}
if (!require("pacman")) install.packages("pacman")
pacman::p_load(
stars, # spatiotemporal data handling
sf, # vector data handling
tidyverse, # data wrangling
cubelyr, # handle raster data
tmap, # make maps
mapview, # make maps
exactextractr, # fast raster data extraction
lubridate, # handle dates
prism # download PRISM data
)
```
- Run the following code to define the theme for map:
```{r define_theme_Chap7, cache = F}
theme_set(theme_bw())
theme_for_map <- theme(
axis.ticks = element_blank(),
axis.text = element_blank(),
axis.line = element_blank(),
panel.border = element_blank(),
panel.grid.major = element_line(color = "transparent"),
panel.grid.minor = element_line(color = "transparent"),
panel.background = element_blank(),
plot.background = element_rect(fill = "transparent", color = "transparent")
)
```
## Understanding the structure of a `stars` object {#stars-structure}
```{r layers-map-gen, echo = F, results = "hide", cache = F}
library(viridis)
theme_for_map <- theme(
axis.ticks = element_blank(),
axis.text = element_blank(),
axis.title = element_blank(),
axis.line = element_blank(),
panel.border = element_blank(),
legend.position = "bottom",
panel.grid.major = element_line(color = "transparent"),
panel.grid.minor = element_line(color = "transparent"),
panel.background = element_blank(),
plot.background = element_rect(fill = "transparent", color = "transparent")
)
vec_lr <- c(1, 0)
vec_sn <- c(0.6, 0.3)
nrow <- 20
ncol <- 20
row_col_data <- expand.grid(row = 1:nrow, col = 1:ncol) %>%
data.table()
make_sf_grid <- function(i) {
row_temp <- row_col_data[i, row]
col_temp <- row_col_data[i, col]
origin <- c(0, 0) + (col_temp - 1) * vec_lr + (row_temp - 1) * vec_sn
temp_mat <- rbind(
origin,
origin + vec_lr,
origin + vec_lr + vec_sn,
origin + vec_sn,
origin
) %>%
list() %>%
st_polygon() %>%
st_sfc() %>%
st_sf()
}
all_polygons <- lapply(1:nrow(row_col_data), make_sf_grid) %>%
do.call("rbind", .)
g_layers <- ggplot() +
scale_fill_viridis_c(name = "tmax") +
xlim(-6, NA) +
theme_for_map
tmax_m8_y09_stars <- read_stars("Data/PRISM_tmax_y2009_m8.tif")
y_shift <- 3
for (i in 1:10) {
tmep_prism_sf <- tmax_m8_y09_stars["PRISM_tmax_y2009_m8.tif", 80:(80 + ncol - 1), 80:(80 + nrow - 1), i]
temp_values <- pull(tmep_prism_sf, PRISM_tmax_y2009_m8.tif) %>% as.vector()
temp_sf_to_add <- mutate(all_polygons, value = temp_values)
temp_sf_to_add <- st_set_geometry(
temp_sf_to_add,
st_geometry(temp_sf_to_add) + c(0, y_shift) * (i - 1)
)
g_layers <- g_layers +
geom_sf(data = temp_sf_to_add, aes(fill = value))
}
for (i in 1:10) {
g_layers <- g_layers +
annotate("text", x = -4, y = (i - 1) * y_shift, label = paste0("8/", 10 + i, "/2009"), size = 4)
}
g_layers <- g_layers +
annotate("text", x = nrow / 2, y = -1, label = "x", size = 6) +
annotate("text", x = ncol + 8, y = 3, label = "y", size = 6)
# saveRDS(g_layers, "Data/stars_layers.rds")
```
```{r prepare_data_and_viz, eval = F, echo = F}
#--- tmax ---#
tmax_m8_y09_small <-
read_stars("Data/PRISM_tmax_y2009_m8.tif") %>%
.[, 80:(80 + ncol - 1), 80:(80 + nrow - 1), 11:20] %>%
st_set_dimensions(3, values = 1:10, names = "band")
#* Note that offset does not get updated when selecting a portion of a stars obejct. Here, we write it to tif and then read it to get the offset right
write_stars(tmax_m8_y09_small, "Data/PRISM_tmax_y2009_m8_small.tif")
tmax_m8_y09_small <-
read_stars("Data/PRISM_tmax_y2009_m8_small.tif") %>%
setNames("tmax")
#--- ppt ---#
ppt_m8_y09_small <-
read_stars("Data/PRISM_ppt_y2009_m8.tif") %>%
.[, 80:(80 + ncol - 1), 80:(80 + nrow - 1), 11:20] %>%
st_set_dimensions(3, values = 1:10, names = "band")
write_stars(ppt_m8_y09_small, "Data/PRISM_ppt_y2009_m8_small.tif")
ppt_m8_y09_small <-
read_stars("Data/PRISM_ppt_y2009_m8_small.tif") %>%
setNames("ppt")
#--- combine ---#
prcp_tmax_PRISM_m8_y09_small <-
c(ppt_m8_y09_small, tmax_m8_y09_small) %>%
st_set_dimensions(
3,
values = seq(ymd("2009/08/11"), ymd("2009/08/20"), by = "days"),
names = "date"
)
saveRDS(prcp_tmax_PRISM_m8_y09_small, "Data/prcp_tmax_PRISM_m8_y09_small.rds")
```
Let's import a `stars` object of daily PRISM precipitation and tmax saved as an R dataset.
```{r read_prism_stars, cache = F}
#--- read PRISM prcp and tmax data ---#
(
prcp_tmax_PRISM_m8_y09 <- readRDS("Data/prcp_tmax_PRISM_m8_y09_small.rds")
)
```
This `stars` object has two attributes: `ppt` (precipitation) and `tmax` (maximum temperature). They are like variables in a regular `data.frame`. They hold information we are interested in using for our analysis.
This `stars` object has three dimensions: `x` (longitude), `y` (latitude), and `date`. Each dimension has `from`, `to`, `offset`, `delta`, `refsys`, `point`, and `values`. Here are their definitions (except `point`[^stars-3]):
[^stars-3]: It is not clear what it really means. I have never had to pay attention to this parameter. So, its definition is not explained here. If you insist on learning what it is the best resource is probably [this](https://r-spatial.github.io/stars/articles/stars4.html)
- `from`: beginning index
- `to`: ending index
- `offset`: starting value
- `delta`: step value
- `refsys`: GCS or CRS for `x` and `y` (and can be `Date` for a date dimension)
- `values`: values of the dimension when objects are not regular
In order to understand the dimensions of `stars` objects, let's first take a look at the figure below (Figure \@ref(fig:two-d-map)), which visualizes the tmax values on the 2D `x`-`y` surfaces of the `stars` objects at 10th `date` value (the spatial distribution of tmax at a particular date).
<br>
```{r get_dim_1, echo = F, cache = F}
dim_prcp_tmin <- st_dimensions(prcp_tmax_PRISM_m8_y09)
```
```{r two-d-map, echo = F, cache = F, fig.cap = "Map of tmax values on August 10, 2009: 20 by 20 matrix of cells"}
# library(mapview)
# library(leaflet)
temp_data <- prcp_tmax_PRISM_m8_y09["tmax", , , 10]
# x_mean <- st_get_dimension_values(temp_data, which = "x") %>% mean
# y_mean <- st_get_dimension_values(temp_data, which = "y") %>% mean
# map_view <- mapview(temp_data)
# map_view@map %>% setView(x_mean, y_mean, zoom = 7.5)
sf_point <-
st_point(c(dim_prcp_tmin$x$offset, dim_prcp_tmin$y$offset)) %>%
st_sfc() %>%
st_set_crs(st_crs(temp_data))
ggplot() +
geom_stars(data = temp_data) +
geom_sf(data = sf_point, color = "red", shape = 1, size = 5) +
scale_fill_viridis_c(name = "tmax") +
theme_for_map
```
<br>
You can consider the 2D x-y surface as a matrix, where the location of a cell is defined by the row number and column number. Since the `from` for `x` is `r dim_prcp_tmin$x$from` and `to` for `x` is `r dim_prcp_tmin$x$to`, we have `r dim_prcp_tmin$x$to - dim_prcp_tmin$x$from + 1` columns. Similarly, since the `from` for `y` is `r dim_prcp_tmin$y$from` and `to` for `y` is `r dim_prcp_tmin$y$to`, we have `r dim_prcp_tmin$y$to - dim_prcp_tmin$y$from + 1` rows. The `offset` value of the `x` and `y` dimensions is the longitude and latitude of the upper-left corner point of the upper left cell of the 2D x-y surface, respectively (the red circle in Figure \@ref(fig:two-d-map)).[^stars-4] As `refsys` indicates, they are in NAD83 GCS. The longitude of the upper-left corner point of all the cells in the $j$th column (from the left) of the 2D x-y surface is `r dim_prcp_tmin$x$offset` + $(j-1)\times$ `r dim_prcp_tmin$x$delta`, where `r dim_prcp_tmin$x$offset` is `offset` for `x` and `r dim_prcp_tmin$x$delta` is `delta` for `x`. Similarly, the latitude of the upper-left corner point of all the cells in the $i$th row (from the top) of the 2D x-y surface is `r dim_prcp_tmin$y$offset` +$(i-1)\times$ `r dim_prcp_tmin$y$delta`, where `r dim_prcp_tmin$y$offset` is `offset` for `y` and `r dim_prcp_tmin$y$delta` is `delta` for `y`.
[^stars-4]: This changes depending on the `delta` of `x` and `y`. If both of the `delta` for `x` and `y` are positive, then the `offset` value of the `x` and `y` dimensions are the longitude and latitude of the upper-left corner point of the lower-left cell of the 2D x-y surface.
The dimension characteristics of `x` and `y` are shared by all the layers across the date dimension, and a particular combination of `x` and `y` indexes refers to exactly the same location on the earth in all the layers across dates (of course). In the `date` dimension, we have `r dim_prcp_tmin$date$to - dim_prcp_tmin$date$from + 1` date values since the `from` for `date` is `r dim_prcp_tmin$date$from` and `to` for `date` is `r dim_prcp_tmin$date$to`. The `refsys` of the `date` dimension is `Date`. Since the `offset` is `r dim_prcp_tmin$date$offset` and `delta` is `r dim_prcp_tmin$date$delta`, $k$th layer represents tmax values for August $11+k-1$, 2009.
Putting all this information together, we have 20 by 20 x-y surfaces stacked over the `date` dimension (10 layers), thus making up a 20 by 20 by 10 three-dimensional array (or cube) as shown in the figure below (Figure \@ref(fig:map-layer-stars-structure)).
```{r map-layer-stars-structure, fig.cap = "Visual illustration of stars data structure", echo = F, cache = F}
g_layers
```
Remember that `prcp_tmax_PRISM_m8_y09` also has another attribute (`ppt`) which is structured exactly the same way as `tmax`. So, `prcp_tmax_PRISM_m8_y09` basically has four dimensions: attribute, `x`, `y`, and `date`.
It is not guaranteed that all the dimensions are regularly spaced or timed. For an irregular dimension, dimension values themselves are stored in `values` instead of using indexes, `offset`, and `delta` to find dimension values. For example, if you observe satellite data with 6-day gaps sometimes and 7-day gaps other times, then the date dimension would be irregular. We will see a made-up example of irregular time dimension in Chapter \@ref(stars-set-time).
## Some basic operations on `stars` objects
### Create a new attribute and drop a attribute
You can add a new attribute to a `stars` object just like you add a variable to a `data.frame`.
```{r}
#--- define a new variable which is tmax - 20 ---#
prcp_tmax_PRISM_m8_y09$tmax_less_20 <- prcp_tmax_PRISM_m8_y09$tmax - 20
```
As you can see below, `prcp_tmax_PRISM_m8_y09` now has `tmax_less_20` as an attribute.
```{r}
prcp_tmax_PRISM_m8_y09
```
You can drop an attribute by assining `NULL` to the attribute you want to drop.
```{r}
prcp_tmax_PRISM_m8_y09$tmax_less_20 <- NULL
prcp_tmax_PRISM_m8_y09
```
### Subset a `stars` object by index
In order to access the value of an attribute (say `ppt`) at a particular location at a particular time from the `stars` object (`prcp_tmax_PRISM_m8_y09`), you need to tell R that you are interested in the `ppt` attribute and specify the corresponding index of `x`, `y`, and `date`. Here, is how you get the `ppt` value of (3, 4) cell at date = 10.
```{r extract_tmax_subset}
prcp_tmax_PRISM_m8_y09["ppt", 3, 4, 10]
```
This is very much similar to accessing a value from a matrix except that we have more dimensions. The first argument selects the attribute of interest, 2nd `x`, 3rd `y`, and 4th `date`.
Of course, you can subset a `stars` object to access the value of multiple cells like this:
```{r extract_tmax_more}
prcp_tmax_PRISM_m8_y09["ppt", 3:6, 3:4, 5:10]
```
### Set attribute names
When you read a raster dataset from a GeoTIFF file, the attribute name is the file name by default. So, you will often encounter cases where you want to change the attribute name. You can set the name of attributes using `setNames()`.
```{r set_names}
prcp_tmax_PRISM_m8_y09_dif_names <- setNames(prcp_tmax_PRISM_m8_y09, c("precipitation", "maximum_temp"))
```
Note that the attribute names of `prcp_tmax_PRISM_m8_y09` have not changed after this operation (just like `mutate()` or other `dplyr` functions do):
```{r names_same}
prcp_tmax_PRISM_m8_y09
```
If you want to reflect changes in the variable names while keeping the same object name, you need to assign the output of `setNames()` to the object as follows:
```{r name_keep, eval = F}
# the codes in the rest of the chapter use "ppt" and "tmax" as the variables names, not these ones
prcp_tmax_PRISM_m8_y09 <- setNames(prcp_tmax_PRISM_m8_y09, c("precipitation", "maximum_temp"))
```
We will be using this function in Chapter \@ref(merge-two).
### Get the coordinate reference system
You can get the CRS of a `stars` object using `st_crs()`, which is actually the same function name that we used to extract the CRS of an `sf` object.
```{r get_crs}
st_crs(prcp_tmax_PRISM_m8_y09)
```
You will use this to change the projection of vector datasets when you interact them:
- crop the `stars` raster dataset to the spatial extent of `sf` objects
- extract values from the `stars` raster dataset for `sf` objects
as we will see in Chapter \@ref(int-RV). Notice also that we used exactly the same function name (`st_crs()`) to get the CRS of `sf` objects (see Chapter \@ref(vector-basics)).
### Get the dimension characteristics and values
You can access these dimension values using `st_dimensions()`.
```{r get_dim}
#--- get dimension characteristics ---#
(
dim_prcp_tmin <- st_dimensions(prcp_tmax_PRISM_m8_y09)
)
```
The following shows what we can get from this object.
```{r }
str(dim_prcp_tmin)
```
For example, you can get `offset` of `x` as follows:
```{r }
dim_prcp_tmin$x$offset
```
You can extract dimension values using `st_get_dimension_values()`. For example, to get values of `date`,
```{r get-values-date}
#--- get date values ---#
st_get_dimension_values(prcp_tmax_PRISM_m8_y09, "date")
```
This can be handy as you will see in Chapter \@ref(daymet-poly). Later in Chapter \@ref(stars-set-time), we will learn how to set dimensions using `st_set_dimensions()`.
### Attributes to dimensions, and vice versa
You can make attributes to dimensions using `merge()`.
```{r merge-att-dim}
(
prcp_tmax_four <- merge(prcp_tmax_PRISM_m8_y09)
)
```
As you can see, the new `stars` object has an additional dimension called `attributes`, which represents attributes and has two dimension values: the first for `ppt` and second for `tmax`. Now, if you want to access `ppt`, you can do the following:
```{r access-ppt}
prcp_tmax_four[, , , , "ppt"]
```
We can do this because the merge kept the attribute names as dimension values as you can see in `values`.
You can revert it back to the original state using `split()`. Since we want the fourth dimension to dissolve,
```{r split}
split(prcp_tmax_four, 4)
```
## Quick visualization for exploration
You can use `plot()` to have a quick static map and `mapview()` or the `tmap` package for interactive views.
### quick static map
```{r static-map}
plot(prcp_tmax_PRISM_m8_y09["tmax", , , ])
```
It only plots the first attribute if the `stars` object has multiple attributes:
```{r static-map-one}
plot(prcp_tmax_PRISM_m8_y09)
```
, which is identical with this:
```{r static-map-identical}
plot(prcp_tmax_PRISM_m8_y09["ppt", , , ])
```
### interactive map
You can use the `tmap` package. You can apply `tmap_leaflet()` to a static `tmap` object to make it an interactive map. The `tm_facets(as.layers = TRUE)` option stacks all the layers in a single map.
```{r tmap-int}
#--- make it interactive ---#
tmap_leaflet(
tm_shape(prcp_tmax_PRISM_m8_y09["tmax", , , ]) +
tm_raster() +
tm_facets(as.layers = TRUE)
)
```
## Reading and writing raster data {#read-write-stars}
There are so many formats in which raster data is stored. Some of the common ones include GeoTIFF, netCDF, GRIB. All the available GDAL drivers for reading and writing raster data can be found by the following code:[^stars-5]
[^stars-5]: What drivers are available varies depending on your particular installation of GDAL version. But, this is not something you need to worry about. It is very unlikely that you are reading or writing raster data in the format that is not available.
```{r get_drivers, eval = F}
sf::st_drivers(what = "raster")
```
The output of the above function is put into a table below.
```{r get_drivers_table, echo = F}
library(DT)
sf::st_drivers(what = "raster") %>% datatable()
```
### Reading raster data
You can use `read_stars()` to read a raster data file. It is very unlikely that the raster file you are trying to read is not in one of the supported formats.
For example, you can read a GeoTIFF file as follows:
```{r read_stars_tif}
(
ppt_m1_y09_stars <- read_stars("Data/PRISM_ppt_y2009_m1.tif")
)
```
This one imports a raw PRISM data stored as a BIL file.
```{r read-prism}
(
prism_tmax_20180701 <- read_stars("Data/PRISM_tmax_stable_4kmD2_20180701_bil/PRISM_tmax_stable_4kmD2_20180701_bil.bil")
)
```
You can import multiple raster data files into one `stars` object by simply supplying a vector of file names:
```{r read-prism-files}
files <-
c(
"Data/PRISM_tmax_stable_4kmD2_20180701_bil/PRISM_tmax_stable_4kmD2_20180701_bil.bil",
"Data/PRISM_tmax_stable_4kmD2_20180702_bil/PRISM_tmax_stable_4kmD2_20180702_bil.bil"
)
(
prism_tmax_201807_two <- read_stars(files)
)
```
As you can see, each file becomes an attribute.[^stars-6] It is more convenient to have them as layers stacked along the third dimension. To do that you can add `along = 3` option as follows:
[^stars-6]: You can use
```{r read-stars-along}
(
prism_tmax_201807_two <- read_stars(files, along = 3)
)
```
------------------------------------------------------------------------
Note that while the GeoTIFF format (and many other formats) can store multi-band (multi-layer) raster data allowing for an additional dimension beyond `x` and `y`, it does not store the values of the dimension like the `date` dimension we saw in `prcp_tmax_PRISM_m8_y09`. So, when you read a multi-layer raster data saved as a GeoTIFF, the third dimension of the resulting `stars` object will always be called `band` without any explicit time information. On the other hand, netCDF files are capable of storing the time dimension values. So, when you read a netCDF file with valid time dimension, you will have time dimension when it is read.
```{r read_ncdf}
(
read_ncdf(system.file("nc/bcsd_obs_1999.nc", package = "stars"))
)
```
The `refsys` of the time dimension is `POSIXct`, which is one of the date classes.
### Writing a `stars` object to a file
You can write a `stars` object to a file using `write_stars()` using one of the GDAL drivers.
Let's save `prcp_tmax_PRISM_m8_y09["tmax",,,]`, which has `date` dimension whose `refsys` is `Date`.
```{r write_stars, eval = F}
write_stars(prcp_tmax_PRISM_m8_y09["tmax", , , ], "Data/tmax_m8_y09_from_stars.tif")
```
Let's read the file we just saved.
```{r read_stars_saved}
read_stars("Data/tmax_m8_y09_from_stars.tif")
```
Notice that the third dimension is now called band and all the date information is lost. The loss of information happened when we saved `prcp_tmax_PRISM_m8_y09["tmax",,,]` as a GeoTIFF file. One easy way to avoid this problem is to just save a `stars` object as an R dataset.
```{r save_as_rds, eval = F}
#--- save it as an rds file ---#
saveRDS(prcp_tmax_PRISM_m8_y09["tmax", , , ], "Data/tmax_m8_y09_from_stars.rds")
```
```{r read_rds}
#--- read it back ---#
readRDS("Data/tmax_m8_y09_from_stars.rds")
```
As you can see, date information is retained. So, if you are the only one who uses this data or all of your team members use R, then this is a nice solution to the problem.[^stars-7] At the moment, it is not possible to use `write_stars()` to write to a netCDF file that supports the third dimension as time. However, this may not be the case for a long time (See the discussion [here](https://github.com/r-spatial/stars/issues/8)).
[^stars-7]: But, you would be giving up the opportunity to use `stars_proxy` by doing so.
## Setting the time dimension manually {#stars-set-time}
For this section, we will use PRISM precipitation data for U.S. for January, 2009.
```{r starting_object}
(
ppt_m1_y09_stars <- read_stars("Data/PRISM_ppt_y2009_m1.tif")
)
```
As you can see, when you read a GeoTIFF file, the third dimension will always be called `band` because GeoTIFF format does not support dimension values for the third dimension.[^stars-8]
[^stars-8]: In Chapter \@ref(stars-structure), we read an `rds`, which is a `stars` object with dimension name set manually by me.
You can use `st_set_dimension()` to set the third dimension (called `band`) as the time dimension using `Date` object. This can be convenient when you would like to filter the data by date using `filter()` as we will see later.
For `ppt_m1_y09_stars`, precipitation is observed on a daily basis from January 1, 2009 to January 31, 2009, where the band value of **x** corresponds to January **x**, 2009. So, we can first create a vector of dates as follows (If you are not familiar with `Dates` and the `lubridate` pacakge, [this](http://uc-r.github.io/dates/) is a good resource to learn them.):
```{r create_dates}
#--- starting date ---#
start_date <- ymd("2009-01-01")
#--- ending date ---#
end_date <- ymd("2009-01-31")
#--- sequence of dates ---#
dates_ls_m1 <- seq(start_date, end_date, "days")
```
We can then use `st_set_dimensions()` to change the third dimension to the dimension of date.
```{ eval = F}
st_set_dimensions(
stars object,
dimension,
values = dimension values,
names = name of the dimension
)
```
```{r dimension_set}
(
ppt_m1_y09_stars <-
st_set_dimensions(
ppt_m1_y09_stars,
3,
values = dates_ls_m1,
names = "date"
)
)
```
As you can see, the third dimension has become date. The value of offset for the date dimension has become `2009-01-01` meaning the starting date value is now `2009-01-01`. Further, the value of delta is now 1 days, so date dimension of **x** corresponds to `2009-01-01` + **x** - 1 = `2009-01-x`.
Note that the date dimension does not have to be regularly spaced. For example, you may have satellite images available for your area with a 5-day interval sometimes and a 6-day interval other times. This is perfectly fine. As an illustration, I will create a wrong sequence of dates for this data with a 2-day gap in the middle and assign them to the date dimension to see what happens.
```{r wrong_date_seq}
#--- 2009-01-23 removed and 2009-02-01 added ---#
(
dates_ls_wrong <- c(seq(start_date, end_date, "days")[-23], ymd("2009-02-01"))
)
```
Now assign these date values to `ppt_m1_y09_stars`:
```{r dimension_set_wrong}
#--- set date values ---#
(
ppt_m1_y09_stars_wrong <-
st_set_dimensions(
ppt_m1_y09_stars,
3,
values = dates_ls_wrong,
names = "date"
)
)
```
Since the step between the date values is no longer $1$ day for the entire sequence, the value of `delta` is now `NA`. However, notice that the value of date is no longer NULL. Since the date is not regular, you cannot represent date using three values (`from`, `to`, and `delta`) any more, and date values for each observation have to be stored now.
Finally, note that just applying `st_set_dimensions()` to a `stars` object does not change the dimension of the `stars` object (just like `setNames()` as we discussed above).
```{r check_stars}
ppt_m1_y09_stars
```
As you can see, the date dimension has not been altered. You need to assign the results of `st_set_dimensions()` to a `stars` object to see the changes in the dimension reflected just like we did above with the right date values.
## `dplyr`-like operations {#dplyr-op}
You can use the `dplyr` language to do basic data operations on `stars` objects.
### `filter()`
The `filter()` function allows you to subset data by dimension values: x, y, and band (here date).
------------------------------------------------------------------------
**spatial filtering**
```{r filter_ex}
library(tidyverse)
#--- longitude greater than -100 ---#
filter(ppt_m1_y09_stars, x > -100) %>% plot()
#--- latitude less than 40 ---#
filter(ppt_m1_y09_stars, y < 40) %>% plot()
```
------------------------------------------------------------------------
**temporal filtering**
Finally, since the date dimension is in `Date`, you can use `Date` math to filter the data.[^stars-9]
[^stars-9]: Of course, this is possible only because we have assigned date values to the band dimension above.
```{r filter_ex_date}
#--- dates after 2009-01-15 ---#
filter(ppt_m1_y09_stars, date > ymd("2009-01-21")) %>% plot()
```
------------------------------------------------------------------------
**filter by attribute?**
Just in case you are wondering. You cannot filter by attribute.
```{r error = TRUE}
filter(ppt_m1_y09_stars, ppt > 20)
```
### `select()`
The `select()` function lets you pick certain attributes.
```{r select_vars}
select(prcp_tmax_PRISM_m8_y09, ppt)
```
### `mutate()`
You can mutate attributes using the `mutate()` function. For example, this can be useful to calculate NDVI in a `stars` object that has Red and NIR (spectral reflectance measurements in the red and near-infrared regions) as attributes. Here, we just simply convert the unit of precipitation from mm to inches.
```{r mm_inches}
#--- mm to inches ---#
mutate(prcp_tmax_PRISM_m8_y09, ppt = ppt * 0.0393701)
```
### `pull()`
You can extract attribute values using `pull()`.
```{r pull}
#--- tmax values of the 1st date layer ---#
pull(prcp_tmax_PRISM_m8_y09["tmax", , , 1], "tmax")
```
## Merging `stars` objects using `c()` and `st_mosaic()`
### Merging `stars` objects along the third dimension (band)
Here we learn how to merge multiple `stars` objects that have
- the **same** attributes
- the **same** spatial extent and resolution
- **different** bands (dates here)
For example, consider merging PRISM precipitation data in January and February. Both of them have exactly the same spatial extent and resolutions and represent the same attribute (precipitation). However, they differ in the third dimension (date). So, you are trying to stack data of the same attributes along the third dimension (date) while making sure that spatial correspondence is maintained. This merge is kind of like `rbind()` that stacks multiple `data.frame`s vertically while making sure the variables are aligned correctly.
Let's import the PRISM precipitation data for February, 2009.
```{r ppt_feb}
#--- read the February ppt data ---#
(
ppt_m2_y09_stars <- read_stars("Data/PRISM_ppt_y2009_m2.tif")
)
```
Note here that the third dimension of `ppt_m2_y09_stars` has not been changed to date.
Now, let's try to merge the two:
```{r combine_r}
#--- combine the two ---#
(
ppt_m1_to_m2_y09_stars <- c(ppt_m1_y09_stars, ppt_m2_y09_stars, along = 3)
)
```
As you noticed, the second object (`ppt_m2_y09_stars`) was assumed to have the same date characteristics as the first one: the data in February is observed daily (`delta` is 1 day). This causes no problem in this instance as the February data is indeed observed daily starting from `2009-02-01`. However, be careful if you are appending the data that does not start from 1 day after (or more generally `delta` for the time dimension) the first data or the data that does not follow the same observation interval.
For this reason, it is advisable to first set the date values if it has not been set. Pretend that the February data actually starts from `2009-02-02` to `2009-03-01` to see what happens when the regular interval (`delta`) is not kept after merging.
```{r combine_caveates}
#--- starting date ---#
start_date <- ymd("2009-02-01")
#--- ending date ---#
end_date <- ymd("2009-02-28")
#--- sequence of dates ---#
dates_ls <- seq(start_date, end_date, "days")
#--- pretend the data actually starts from `2009-02-02` to `2009-03-01` ---#
(
ppt_m2_y09_stars <- st_set_dimensions(ppt_m2_y09_stars, 3, values = c(dates_ls[-1], ymd("2009-03-01")), name = "date")
)
```
If you merge the two,
```{r combine_caveates_2}
c(ppt_m1_y09_stars, ppt_m2_y09_stars, along = 3)
```
The date dimension does not have `delta` any more, and correctly so because there is a one-day gap between the end date of the first `stars` object ("2009-01-31") and the start date of the second `stars` object ("2009-02-02"). So, all the date values are now stored in `values`.
### Merging `stars` objects of different attributes {#merge-two}
Here we learn how to merge multiple `stars` objects that have
- **different** attributes
- the **same** spatial extent and resolution
- the **same** bands (dates here)
For example, consider merging PRISM precipitation and tmax data in January. Both of them have exactly the same spatial extent and resolutions and the date characteristics (starting and ending on the same dates with the same time interval). However, they differ in what they represent: precipitation and tmax. This merge is kind of like `cbind()` that combines multiple `data.frame`s of different variables while making sure the observation correspondence is correct.
Let's read the daily tmax data for January, 2009:
```{r tmax_read}
(
tmax_m1_y09_stars <- read_stars("Data/PRISM_tmax_y2009_m1.tif") %>%
#--- change the attribute name ---#
setNames("tmax")
)
```
Now, let's merge the PRISM ppt and tmax data in January, 2009.
```{r ppt_tmax_combine, error = TRUE}
c(ppt_m1_y09_stars, tmax_m1_y09_stars)
```
Oops. Well, the problem is that the third dimension of the two objects is not the same. Even though we know that the **x**th element of their third dimension represent the same thing, they look different to R's eyes. So, we first need to change the third dimension of `tmax_m1_y09_stars` to be consistent with the third dimension of `ppt_m1_y09_stars` (`dates_ls_m1` was defined in Chapter \@ref(stars-set-time)).
```{r set_dim_first_combine}
(
tmax_m1_y09_stars <- st_set_dimensions(tmax_m1_y09_stars, 3, values = dates_ls_m1, names = "date")
)
```
Now, we can merge the two.
```{r ppt_tmax_combine_2}
(
ppt_tmax_m1_y09_stars <- c(ppt_m1_y09_stars, tmax_m1_y09_stars)
)
```
As you can see, now we have another attribute called `tmax`.
### Merging `stars` objects of different spatial extents
Here we learn how to merge multiple `stars` objects that have
- the **same** attributes
- **different** spatial extent but **same** resolution[^stars-10]
- the **same** bands (dates here)
[^stars-10]: Technically, you can actually merge `stars` objects of different spatial resolutions. But, you probably should not.
Some times you have multiple separate raster datasets that have different spatial coverages and would like to combine them into one. You can do that using `st_mosaic()`.
Let's split `tmax_m1_y09_stars` into two parts:
```{r split-to-two}
tmax_1 <- filter(tmax_m1_y09_stars, x <= -100)
tmax_2 <- filter(tmax_m1_y09_stars, x > -100)
```
Here (Figure \@ref(fig:map-indiv)) is what they look like (only Jan 1, 1980):
```{r map-indiv, fig.cap = "Two spatially non-overlapping stars objects"}
g_1 <- ggplot() +
geom_stars(data = tmax_1[, , , 1]) +
theme_for_map +
theme(
legend.position = "bottom"
)
g_2 <- ggplot() +
geom_stars(data = tmax_2[, , , 1]) +
theme_for_map +
theme(
legend.position = "bottom"
)
library(patchwork)
g_1 + g_2
```
Let's combine the two using `st_mosaic()`:
```{r combine_mosaic}
tmax_combined <- st_mosaic(tmax_1, tmax_2)
```
Here is what the combined object looks like (Figure \@ref(fig:plot-combine))
```{r plot-combine, fig.cap = "Map of the stars objects combined into one"}
ggplot() +
geom_stars(data = tmax_combined[, , , 1]) +
theme_for_map
```
------------------------------------------------------------------------
It is okay to have the two `stars` objects to be combined have a spatial overlap. The following split creates two `stars` objects with a spatial overlap.
```{r split_overlap}
tmax_1 <- filter(tmax_m1_y09_stars, x <= -100)
tmax_2 <- filter(tmax_m1_y09_stars, x > -110)
```
Here (Figure \@ref(fig:map-indiv-2)) is what they look like (only Jan 1, 1980):
```{r map-indiv-2, fig.height = 8, fig.cap = "Two spatially overlapping stars objects"}
g_1 <- ggplot() +
geom_stars(data = tmax_m1_y09_stars[, , , 1], fill = NA) +
geom_stars(data = tmax_1[, , , 1]) +
theme_for_map
g_2 <- ggplot() +
geom_stars(data = tmax_m1_y09_stars[, , , 1], fill = NA) +
geom_stars(data = tmax_2[, , , 1]) +
theme_for_map
library(patchwork)
g_1 / g_2
```
As you can see below, `st_mosaic()` reconciles the spatial overlap between the two `stars` objects.
```{r combine_ovelapped}
(
tmax_combined <- st_mosaic(tmax_1, tmax_2)
)
```
Here is the plot (Figure \@ref(fig:plot-combine-overlapped)):
```{r plot-combine-overlapped, fig.cap = "Map of the spatially-overlapping stars objects combined into one"}
ggplot() +
geom_stars(data = tmax_combined[, , , 1]) +
theme_for_map
```
## Apply a function to one or more dimensions
Sometimes, you want to apply a function to one or more dimensions of a `stars` object. For example, you might want to calculate total annual precipitation for each cell from daily precipitation raster data. For illustration, let's create a single-attribute `stars` object that holds daily precipitation in August in 2009.
```{r}
(
ppt_m8_y09 <- prcp_tmax_PRISM_m8_y09["ppt"]
)
```
Let's find total precipitation for each cell in August in 2009 using `st_apply()`, which works like this.
```{}
st_apply(
X = stars object,
MARGIN = margin,
FUN = function to apply
)
```
Since we want to sum the value of attribute (ppt) for each of the cells (each cell is represented by `x-y`), we specify `MARGIN = c("x", "y")` and use `sum()` as the function.
```{r}
monthly_ppt <-
st_apply(
ppt_m8_y09,
MARGIN = c("x", "y"),
FUN = sum
)
```
```{r}
plot(monthly_ppt)
```
## Convert from and to `Raster`$^*$ or `SpatRaster` objects {#convert-to-rb}
When you need to convert a `stars` object to a `Raster`$^*$ or `SpatRaster` object, you can use the `as()` function as follows:
```{r convert_to_raster}
(
# to Raster* object
prcp_tmax_PRISM_m8_y09_rb <- as(prcp_tmax_PRISM_m8_y09, "Raster")
)
(
# to SpatRaster
prcp_tmax_PRISM_m8_y09_sr <- as(prcp_tmax_PRISM_m8_y09, "SpatRaster")
)
```
As you can see, date values in `prcp_tmax_PRISM_m8_y09` appear in the time dimension of `prcp_tmax_PRISM_m8_y09_rb` (`RasterBrick`). However, in `prcp_tmax_PRISM_m8_y09_sr` (`SpatRaster`), date values appear in the `names` dimension with `date` added as prefix to the original date values.
Note also that the conversion was done for only the `ppt` attribute. This is simply because the `raster` and `terra`package does not accommodate multiple attributes of 3-dimensional array. So, if you want a `RasterBrick` or `SpatRaster` of the tmax data, then you need to do the following:
```{r tmax_conv}
(
# to RasterBrick
tmax_PRISM_m8_y09_rb <- as(prcp_tmax_PRISM_m8_y09["tmax", , , ], "Raster")
)
(
# to SpatRaster