-
Notifications
You must be signed in to change notification settings - Fork 59
analog_demo
The aim of this example is to obtain a downscaled time series for a weather station in Madrid (Spain) for the period 2001-2010 building upon the observed time series for the period 1991-2000, using the analog method implementation in downscaleR. In this example, the NCEP reanalysis data will be used both in the calibration and in the projection steps, although note that once the analogs are trained with the reanalysis, the projections could be done using for instance a GCM, like in this practice in which a seasonal forecast model output is downscaled.
All the required data for the first part of the practice are included in the built-in datasets in the downscaleR package, although note that NCEP data can be also accessed via the ECOMS User Data Gateway using the loadECOMS
interface of the ecomsUDG.Raccess package.
The example will be done on the Iberian Peninsula. The example is based on previous research in this region on the performance of different downscaling methods by Gutiérrez et al. 2013. Thus, the selection of spatial domain and predictors is based on their findings.
2. Key inputs (nomenclature in Gutiérrez et al 2013 indicated in boldface, for reference):
- Predictand: Minimum surface daily temperature (tasmin) from GSN station data in Madrid (Spain)
- Target season: Boreal winter (DJF)
- Calibration (training) period: 1991-2000
- Test period: 2001-2010
- Domain of analysis: 11ºW, 5ºE and 35ºN 46ºN (domain Z6, Fig. 1)
- Combination of predictors: sea-level pressure (psl) and mean surface temperature (tas) (Pattern P5, Table 2)
- Downscaling method: Deterministic nearest neighbour analogs (method M1a, Table 3)
To find out the directory containing the observations, simply type:
library(downscaleR)
gsn <- file.path(find.package("downscaleR"), "datasets/observations/GSN_Iberia")
stationInfo provides a quick overview of the dataset:
stationInfo(gsn)
will return:
[2014-09-01 09:58:37] Doing inventory ...
[2014-09-01 09:58:37] Done.
stationID longitude latitude altitude location WMO_Id Koppen.class
1 SP000008027 -2.0392 43.3075 251 SAN SEBASTIAN - IGUELDO 8027 Cfb
2 SP000008181 2.0697 41.2928 4 BARCELONA/AEROPUERTO 8181 Csa
3 SP000008202 -5.4981 40.9592 790 SALAMANCA AEROPUERTO 8202 BSk
4 SP000008215 -4.0103 40.7806 1894 NAVACERRADA 8215 Csb
5 SP000008280 -1.8631 38.9519 704 ALBACETE LOS LLANOS 8280 BSk
6 SP000008410 -4.8458 37.8442 90 CORDOBA AEROPUERTO 8410 Csa
In particular, we see that the Navacerrada station is the fourth in the dataset (code = "SP000008215"), and from the data inventory that minimum temperature is here termed "tmin". There are different options for loading arbitrary stations from a dataset. In this case we are interested in loading just one single station, located near Madrid and called "Navacerrada". The easiest approach is to provide an (approximated) coordinate to Madrid (-3.7E,40.42N), and the function will return the closest station to the given point. Alternatively, it is also possible to retrieve the station from its code (safer in case there are other stations in the proximity that may be confounded):
obs <- loadStationData(dataset = gsn, var = "tmin", stationID = "SP000008215", season = c(12,1,2), years = 1991:2000, tz = "GMT")
The set of predictors (a.k.a. atmospheric pattern) P5 is next loaded for the training period. A single variable for a specific spatiotemporal domain is what we term a field in downscaleR
terminology. In this case, for the same spatio-temporal domain we are loading two different variables, that can be merged together forming a multifield. A multifield can be loaded from a given dataset using the loadMultiField function, that takes care of regridding to ensure the spatial matching of the different input fields. (NOTE: it is also possible to load the different fields sepparately and then join them using the multifield constructor makeMultiField.
This is the path to the built-in NCEP dataset:
ncep <- file.path(find.package("downscaleR"), "datasets/reanalysis/Iberia_NCEP/Iberia_NCEP.ncml")
The data inventory provides a quick overview of the dataset characteristcs. In particular, we are interested in the variables of the atmospheric pattern P5 (mean surface temperature and sea-level pressure) whose standard names are "tas" and "psl" respectively, as indicated in the vocabulary (see help("vocabulary")
for details). These variables are termed "2T" and "SLP" in the original dataset:
ncep.inv <- dataInventory(ncep)
names(ncep.inv)
will return the names of the variables in the dataset:
[1] "Z" "T" "Q" "2T" "SLP" "pr"
The nomenclature between the standard naming convention of the vocabulary and the dataset naming is given by the dictionary, that can be accessed at this location:
dictionary.ncep <- file.path(find.package("downscaleR"), "datasets/reanalysis/Iberia_NCEP/Iberia_NCEP.dic")
As the dictionary is in the same location than the dataset, it is enough with specifying the argument dictionary = TRUE
to make use of it (the default option), and the function will perform all the necessary transformations to return the standard variables, as defined in the vocabulary:
pred <- loadMultiField(ncep, vars = c("tas","psl"), lonLim = c(-11,5), latLim = c(35,46), season = c(12,1,2), years = 1991:2000)
# This is how the multifield looks like:
plotMeanField(pred)
In this case, we are also using NCEP as test data, but considering the test period 2001-2010. Therefore, the test multifield is loaded as previously shown, but changing accordingly the value of the years
argument:
test <- loadMultiField(ncep, vars = c("tas","psl"), lonLim = c(-11,5), latLim = c(35,46), season = c(12,1,2), years = 2001:2010)
It is a standard practice in atmospheric science to apply principal component analysis (PCA, a.k.a. EOF analysis) to the predictors, eliminating the large amount of redundant information while minimizing the loss of variability, being therefore an efficient technique for reduction of data dimensionality. Thus, instead of using the raw multifield loaded as predictor (which is also possible), we will perform a previous PCA on it. In this example, we will retain the PCs explaining the 99% of the variability of each field (although note that it is also possible to retain all PCs, or a user-defined number of them: type help("prinComp")
for details).
pca.pred <- prinComp(pred, v.exp = .99)
The main spatial modes after the PCA can be visually inspected using the plotEOF
function. For instance, for sea-level pressure these are all the EOFs obtained:
plotEOF(pca.pred, "psl")
The analog method is applied to the above datasets. The arguments follow the order observations > predictors > test data. Further additional arguments control the different options of the analog construction, but the default values are ok for this example (type help("analogs")
for more details):
output <- analogs(obs, pca.pred, test)
NOTE: Instead on the principal components results, the argument pred
can be filled with the original multifield. The use of the function would be similar:
# Do not run, just as an example
noPCA.output <- analogs(obs, pred, test)
However, note that due to the advantages of using EOFs rather than the raw fields, in the following only the results from using the principal components will be shown.
The structure of the output is similar to that of the input observations, but the attributes in the Data element indicate that this is the output of an analog downscaling, and that the test data correspond to NCEP (note that the value of the last attribute depends on the user machine, as dataset is locally stored):
attributes(output$Data)
will return the following attributes info:
$dimensions
[1] "time" "station"
$`downscaling:method`
[1] "analogs"
$`downscaling:simulation_data`
[1] "/home/user/R/i686-pc-linux-gnu-library/3.1/downscaleR/datasets/reanalysis/Iberia_NCEP/Iberia_NCEP.ncml"
It is not the aim of this demo to show a full validation procedure, but just to show a brief comparison of the downscaled series versus the actual series. To this aim, we load the validation data from the GSN dataset for the test period:
val <- loadStationData(dataset = gsn, var = "tmin", stationID = "SP000008215", season = c(12,1,2), years = 2001:2010, tz = "GMT")
The following lines make a basic comparison of the observed and downscaled time series for the validation period, generating the next figure:
# The test period is coerced to a POSIXlt class object to define the temporal dimension
test.period <- as.POSIXlt(val$Dates$start, tz = "GMT")
# Partition the graphic window into two
par(mfrow=c(2,1))
# Panel a
plot(test.period, output$Data, ty = 'l', ylab = "tasmin")
lines(test.period, val$Data, col = "red", lty = 2)
legend("topleft", c("Observed", "Downscaled"), lty = c(1,2), col = c(1,2))
# Panel b
plot(val$Data, output$Data, asp = 1, ylab = "Observed", xlab = "Downscaled", col = "blue")
lm1 <- lm(val$Data ~ output$Data)
abline(reg = lm1, col = "red")
r2 <- round(summary(lm1)$adj.r.squared, 2)
bias <- round(mean(output$Data - val$Data, na.rm = TRUE), 2)
mtext(paste("Rsq =", r2, " / bias = ", bias))
# Reset the graphic device
par(mfrow=c(1,1))
The result obtained can be checked againd the results in Gutiérrez et al 2013, by comparing the correlation obtained in this exercise (r^2 = 0.44, equivalent to a correlation of ~0.66), as obtained depicted in Fig. 6 (Upper left panel on the right block, corresponding to method M1a.). From these results we can expect a notable improvement of model performance in spring, and much more in autumn, for which the highest skill is attained by the method (go ahead if you feel curious...).
- Gutiérrez J, San-Martín D, Brands S, Manzanas R, Herrera S (2012) Reassessing statistical downscaling techniques for their robust application under climate change conditions. J. Clim. 26, 171–188.
downscaleR - Santander MetGroup (Univ. Cantabria - CSIC)