-
Notifications
You must be signed in to change notification settings - Fork 8
Example. Analysis and prediction of circulation weather types
This is a working example of the application of analysis and prediction of CTs and WTs. In the next example, the clusters of a subset of the GCM CMIP5 dataset will be firstly obtained, considering the Sea Pressure Level (psl
), Near-Surface Air Temperature (tas
) and Specific Humidity at a height of 850m (hus850
) over the Iberia peninsula in the winter season during the period of 1983-2002 (arg. grid
). This will constitute an analysis of Circulation Types from training data. Later on, the clusters of the same subset of the GCM CMIP5 but in the future time domain 2081-2100 (arg. newdata
) will be predicted referring to the training analysis. With this, the CTs of Iberia peninsula in the future time domain can be predicted, and it can be analyzed the change of the frequency of appearance of each CT considering both time domains.
#Data for training
grid <- makeMultiGrid(CMIP5_Iberia_psl, CMIP5_Iberia_tas, CMIP5_Iberia_hus850)
#Data for prediction
newdata <- makeMultiGrid(CMIP5_Iberia_psl.rcp85, CMIP5_Iberia_tas.rcp85, CMIP5_Iberia_hus850.rcp85)
All the used datasets are included in transformeR
package. Now that grid
and newdata
inputs are ready, the clustering analysis for the training data can be performed. It is considered k-means algorithm and centers = 10
for this example:
clusters.training <- clusterGrid(grid = grid, type = "kmeans", centers = 10, iter.max = 10000, nstart = 10)
The resulting CTs are stored in attributes wt.index
and centroids
. Further information about the clustering algorithm used k-means can be found in other attributes. The absolute frequency of the CTs can be seen by running the following code:
wt.index <- attr(clusters.training, "wt.index")
table(wt.index)
# wt.index
# 1 2 3 4 5 6 7 8 9 10
# 215 221 135 241 190 89 269 117 82 246
It can be observed that the most frequent CT is number 7 happening 269 days out of 1805 from the dataset, and the least frequent CT is CT number 9 (82 days out of 1805).
The prediction of CT in future data can be performed after the training CT are computed. This second step is also executed with clusterGrid
. A grid resulting from clusterGrid is used now as input in the 'grid' argument, otherwise the function will return an error message. The input grid newdata
must be lat,lon
and season
consistent with grid
. Variables among input grids must be consistent too. Otherwise, the function will return an error message.
clusters.prediction <- clusterGrid(grid = clusters.training, newdata = newdata, centers = attr(clusters.training, "centers"))
The absolute frequency of the predicted CTs are now analyzed:
wt.index2 <- attr(clusters.prediction, "wt.index")
table(wt.index2)
# wt.index2
# 1 2 3 4 5 6 7 8 9 10
# 228 237 179 206 160 109 286 97 59 243
A plot representing the Absolute Frequencies of the CTs for the training and prediction data can be generated with the following code in order to give a visual idea of the outcome of this experiment:
t <- table(wt.index); t2 <- table(wt.index2)
plot(t,ylim=c(0,300), ty = "h", col = 155, xlab = "Circulation Type ID", ylab = "Freq.")
lines(t2, ty = "b", col = 60)
legend("topleft", legend = c("Training","Prediction"), col = c(155,60), bty = "n", pch=20 , pt.cex = 2, cex = 0.8, horiz = FALSE, inset = c(0.05, 0.05))
title(main = "Absolute Frequencies of Circulation Types")
It can be noticed that CT number 7 is the most frequent for the predicted period and its frequency increased comparing to the training period. This CT can be extracted from the output grid clusters.prediction
for further analysis by using the function subsetGrid
with the cluster
argument:
CT7 <- subsetGrid(grid = clusters.prediction, cluster = 7)
print(sessionInfo())
## R version 3.6.0 (2019-04-26)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 18.04.2 LTS
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 ## LC_MONETARY=en_US.UTF-8
## [6] LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C ## LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
## other attached packages:
## [1] transformeR_1.7.1 ggplot2_3.2.0 sp_1.3-1
## loaded via a namespace (and not attached):
## [1] pkgload_1.0.2 maps_3.3.0 jsonlite_1.6 SpecsVerification_0.5-2 dotCall64_1.0-0
## [6] kohonen_3.0.8 sm_2.2-5.6 assertthat_0.2.1 latticeExtra_0.6-28 yaml_2.2.0
## [11] pillar_1.4.2 backports_1.1.4 lattice_0.20-38 glue_1.3.1 RcppEigen_0.3.3.5.0
## [16] digest_0.6.20 RColorBrewer_1.1-2 colorspace_1.4-1 htmltools_0.4.0 Matrix_1.2-17
## [21] pkgconfig_2.0.2 raster_3.0-2 padr_0.5.0 purrr_0.3.2 scales_1.0.0
## [26] brew_1.0-6 CircStats_0.2-6 dtw_1.20-1 tibble_2.1.3 proxy_0.4-23
## [31] withr_2.1.2 verification_1.42 pbapply_1.4-1 lazyeval_0.2.2 magrittr_1.5
## [36] crayon_1.3.4 easyVerification_0.4.4 evaluate_0.14 MASS_7.3-51.4 xml2_1.2.2
## [41] tools_3.6.0 data.table_1.12.2 stringr_1.4.0 munsell_0.5.0 akima_0.6-2
## [46] compiler_3.6.0 mapplots_1.5.1 rlang_0.4.0 grid_3.6.0 rstudioapi_0.10
## [51] spam_2.2-2 base64enc_0.1-3 tcltk_3.6.0 rmarkdown_2.1 vioplot_0.3.2
## [56] boot_1.3-22 testthat_2.2.1 gtable_0.3.0 codetools_0.2-16 abind_1.4-5
## [61] roxygen2_6.1.1 R6_2.4.0 zoo_1.8-6 knitr_1.28 dplyr_0.8.3
## [66] commonmark_1.7 visualizeR_1.5.0 rprojroot_1.3-2 desc_1.2.0 stringi_1.4.3
## [71] parallel_3.6.0 Rcpp_1.0.2 fields_9.8-6 tidyselect_0.2.5 xfun_0.12
transformeR - Santander MetGroup (Univ. Cantabria - CSIC)
- Package Installation
- Included illustrative datasets
- Standard data manipulation
- Principal Components (and EOFs)
- Circulation and Weather Typing