Skip to content

Calibration and cross validation

miturbide edited this page Apr 27, 2018 · 15 revisions

This is a worked example of the application of several bias correction methods. In order to show the way in which each method operates, in the next examples the correction is applied considering the same period for the train and test data (1991-2000), however, usually RCM or GCM are used for test.

The example will be done on the NCEP/NCAR Reanalysis 1 dataset, for the Igueldo station (San Sebastian - Spain) of the VALUE station dataset.

Instead of a single station, the observations might be multiple stations or gridded data.

# If we have not installed the "devtools" library we should do it:
# install.packages("devtools")
# library(devtools)
# Installing the downscaleR R-package
# devtools::install_github(c("SantanderMetGroup/downscaleR.java@stable","SantanderMetGroup/downscaleR@stable"))
library(downscaleR)

###Data

With the purpose of running simple examples and illustrating the functionalities of downscaleR, several R-data objects are available in the package. This data is originally loaded with the loadeR package (an R package for climate data access and manipulation) that can be installed also from its GitHub repository (for more information go to the wiki of loadeR).

####Observations

In this example we will bias correct winter precipitation and mean temperature for the Igueldo station, using as observations data from the VALUE_ECA&D dataset, considering the period 1991-2000 (type help(VALUE_Igueldo_tp) and help(VALUE_Igueldo_tas)).

# precipitation
data(VALUE_Igueldo_tp)
y <- VALUE_Igueldo_tp

# temperature
data(VALUE_Igueldo_tas)
y.t <- VALUE_Igueldo_tas

####Predictors

NCEP reanalysis model data will be used as predictors and test data (type help(NCEP_Iberia_tp) and help(NCEP_Iberia_tas)).

# precipitation
data(NCEP_Iberia_tp)
x <- NCEP_Iberia_tp

# temperature
data(NCEP_Iberia_tas)
x.t <- NCEP_Iberia_tas

Next the NCEP grid (x) and the Igueldo station (y) are plotted with plotMeanGrid.

plotMeanGrid(x)
points(y$xyCoords[,1], y$xyCoords[,2],  cex = 2, pch = 17)

The available properties to define the methods are:

-method: "delta", "scaling", "eqm", "gqm" and "gpqm. See the examples below for further details in each method.

-pr.threshold: The minimum value that is considered as a non-zero precipitation. Ignored for varcode values different from "pr".

-multi.member: Should members be adjusted sepparately (TRUE, default), or jointly (FALSE)?. Ignored if the dataset has no members.

-window: Numeric value specifying the time window width used to calibrate. The window is centered on the target day. Default to NULL, which considers the whole period available.

-scaling.type: Character indicating the type of the scaling method. Options are "additive" or "multiplicative". This argument is ignored if "scaling" is not selected as the bias correction method.

-extrapolation: Character indicating the extrapolation method to be applied to correct test values that are out of the range of train values. Extrapolation is applied only to the "eqm" method. Options are "no" (Default: do not extrapolate) and "constant".

-theta (only for method 'gpqm'): upper threshold (and lower for the left tail of the distributions, if needed) above which values are fitted to a Generalized Pareto Distribution (GPD). Values below this threshold are fitted to a gamma (normal) distribution. By default, 'theta' is the 95th percentile (5th percentile for the left tail).

Delta

This method consists on adding to the observations the mean change signal (delta method). It is applicable to any kind of variable but it is preferable to avoid it for bounded variables (e.g. precipitation, wind speed, etc.) because values out of the variable range could be obtained (e.g. negative wind speeds...).

cal <- biasCorrection(y = y, x = x, newdata = x, precipitation = TRUE, method = "delta")
cal.t <- biasCorrection (y = y.t, x = x.t, newdata = x.t, method = "delta")

Once we have calibrated the simulation for the test period, we can validate the result against the observational reference. This can be easily done with function quickDiagnostics that returns two graphs. The first shows time-series of the original simulation (red), the calibrated simulation (blue) and the observation (black). The second graph is the quantile-quantile plot, that better illustrates the effect of the applied method.

# precipitation
quickDiagnostics(y, x, cal, type = "daily", loc = y$xyCoords[1,])

# temperature
quickDiagnostics(y.t, x.t, cal.t, type = "daily", loc = y$xyCoords[1,])

Scaling

This method is very similar to the delta method but, in this case, the correction consist on scaling the simulation with the difference (scaling.type = "additive") or quotient (scaling.type = "multiplicative") between the mean of the observations and the simulation in the train period. The additive version is preferably applicable to unbounded variables (e.g. temperature) and the multiplicative to variables with a lower bound (e.g. precipitation, because it also preserves the frequency).

See Wetterhall et al.2012 for a comparison with more sophisticated methods considering the scaling method as a benchmark.

#For precipitation
cal <- biasCorrection (y = y, x = x, newdata = x, precipitation = TRUE, method = "scaling", 
                       scaling.type = "multiplicative")
#For temperature
cal.t <- biasCorrection (y = y.t, x = x.t, newdata = x.t, method = "scaling",
                         scaling.type = "additive")
# precipitation
quickDiagnostics(y, x, cal, type = "daily", loc = y$xyCoords[1,])

# temperature
quickDiagnostics(y.t, x.t, cal.t, type = "daily", loc = y$xyCoords[1,])

eQM

The empirical quantile mapping is a very extended bias correction method which consists on calibrating the simulated Cumulative Distribution Function (CDF) by adding to the observed quantiles both the mean delta change and the individual delta changes in the corresponding quantile.

cal <- biasCorrection (y = y, x = x, newdata = x, precipitation = TRUE, method = "eqm")
cal.t <- biasCorrection (y = y.t, x = x.t, newdata = x.t, method = "eqm")
# precipitation
quickDiagnostics(y, x, cal, type = "daily", loc = y$xyCoords[1,])

# temperature
quickDiagnostics(y.t, x.t, cal.t, type = "daily", loc = y$xyCoords[1,])

gQM

This parametric quantile mapping is based on the initial assumption that both observed and simulated intensity distributions are well approximated by the gamma distribution, therefore it uses the theorical instead of the empirical distribution. It is described in Piani et al. 2010. It is only applicable to precipitation.

cal <- biasCorrection(y = y, x = x, newdata = x, precipitation = TRUE, method = "gqm")
# precipitation
quickDiagnostics(y, x, cal, type = "daily", loc = y$xyCoords[1,])

gpQM

This method was proposed by Gutjahr and Heinemann 2013. It is also a parametric quantile mapping based on a gamma and Generalized Pareto Distribution (GPD). By default, the threshold above which the GPD is fitted is the 95th percentile of the observed and the predicted wet-day distribution, respectively. The user can specify a different threshold by modifying the parameter theta, by giving the observational (predicted) threshold in the first (second) row and Nnodes columns.

cal <- biasCorrection (y = y, x = x, newdata = x, precipitation = TRUE, method = "gpqm", theta = .70)
# precipitation
quickDiagnostics(y, x, cal, type = "daily", loc = y$xyCoords[1,])


<-- Home page of the Wiki

print(sessionInfo())
Clone this wiki locally