-
Notifications
You must be signed in to change notification settings - Fork 59
Accessing Gridded Datasets
Before opening and loading the data, it is advisable to get an overview of the variables stored, time and spatial extent, units etc. This is done by the dataInventory
function, which receives as argument the path to the file storing the data (this can be either a netcdf file or a NcML). In the following examples the built-in dataset in downscaleR
will be illustrated. This dataset comes from the NCEP/NCAR reanalysis 1 encompassing the period 1961-2010 for the Iberian Peninsula domain.
# First, the path to the built-in dataset is retrieved:
ncep <- file.path(find.package("downscaleR"), "datasets/reanalysis/Iberia_NCEP/Iberia_NCEP.ncml")
di <- dataInventory(ncep)
## [2014-06-03 11:21:22] Doing inventory ...
## [2014-06-03 11:21:24] Retrieving info for '2T' (5 vars remaining)
## [2014-06-03 11:21:24] Retrieving info for 'SLP' (4 vars remaining)
## [2014-06-03 11:21:24] Retrieving info for 'TP' (3 vars remaining)
## [2014-06-03 11:21:24] Retrieving info for 'Q' (2 vars remaining)
## [2014-06-03 11:21:24] Retrieving info for 'T' (1 vars remaining)
## [2014-06-03 11:21:24] Retrieving info for 'Z' (0 vars remaining)
## [2014-06-03 11:21:25] Done.
The inventory returns a list named of length n, being n the number of variables stored in the dataset named as the variables:
names(di)
## [1] "2T" "SLP" "TP" "Q" "T" "Z"
For each variable in the dataset (e.g., T), the following information is given:
str(di$T)
## List of 4
## $ Description: chr "Temperature"
## $ DataType : chr "float"
## $ Units : chr "K"
## $ Dimensions :List of 4
## ..$ time:List of 4
## .. ..$ Type : chr "Time"
## .. ..$ TimeStep : chr "1.0 days"
## .. ..$ Units : chr "days since 1950-01-01 00:00:00"
## .. ..$ Date_range: chr "1961-01-01T00:00:00Z - 2010-12-31T00:00:00Z"
## ..$ lev :List of 3
## .. ..$ Type : chr "Pressure"
## .. ..$ Units : chr "Pa"
## .. ..$ Values: num [1:4] 50000 70000 85000 100000
## ..$ rlat:List of 3
## .. ..$ Type : chr "Lat"
## .. ..$ Units : chr "degrees north"
## .. ..$ Values: num [1:6] 35 37.5 40 42.5 45 47.5
## ..$ rlon:List of 3
## .. ..$ Type : chr "Lon"
## .. ..$ Units : chr "degrees east"
## .. ..$ Values: num [1:9] -15 -12.5 -10 -7.5 -5 -2.5 0 2.5 5
So T is air temperature, and has three vertical levels (50000, 80000 and 100000 Pascals).
In the following example we will load temperature at 1000 mb level (100000 Pa) for a selected rectangular domain centred on the Iberian Peninsula, considering summer (JJA) for the period 1981-2000. A single variable within a defined spatio-temporal domain is what we call a field. The particular variables of each dataset are translated -and transformed if necessary- into the common vocabulary by means of a dictionary, therefore the standard names included in the vocabulary should be used at this point (e.g. "ta" for air temperature in the following example). Find more details about the vocabulary here or by typing help(vocabulary)
in the R command line. Note that the "@" symbol is used in order
to specify a particular vertical level, when levels exist. A field can only contain one vertical level. Several levels would be handled using a multifield, as we shall explain in the next section.
example1.grid <- loadGridData(dataset = ncep, var = "ta@100000", lonLim = c(-12, 5), latLim=c(35,45), season=c(6:8), years = 1981:2000)
## [2014-09-01 19:37:10] Defining homogeneization parameters for variable "ta"
## [2014-09-01 19:37:10] Defining geo-location parameters
## [2014-09-01 19:37:10] Defining time selection parameters
## [2014-09-01 19:37:10] Retrieving data subset ...
## [2014-09-01 19:37:11] Done
The function returns the requested data subset and all the necessary information for the identification of the variable and its temporal and spatial definition:
str(example1.grid)
## List of 4
## $ Variable:List of 3
## ..$ varName : chr "ta"
## ..$ isStandard: logi TRUE
## ..$ level : num 1e+05
## $ Data : num [1:1840, 1:5, 1:8] 15.9 15.2 15.6 15.9 15.7 ...
## ..- attr(*, "dimensions")= chr [1:3] "time" "lat" "lon"
## $ xyCoords:List of 2
## ..$ x: num [1:8] -12.5 -10 -7.5 -5 -2.5 0 2.5 5
## ..$ y: num [1:5] 35 37.5 40 42.5 45
## ..- attr(*, "projection")= chr "LatLonProjection"
## $ Dates :List of 2
## ..$ start: chr [1:1840] "1981-06-01 00:00:00 GMT" "1981-06-02 00:00:00 GMT" "1981-06-03 00:00:00 GMT" "1981-06-04 00:00:00 GMT" ...
## ..$ end : chr [1:1840] "1981-06-02 00:00:00 GMT" "1981-06-03 00:00:00 GMT" "1981-06-04 00:00:00 GMT" "1981-06-05 00:00:00 GMT" ...
## - attr(*, "dataset")= chr "/home/user/R/i686-pc-linux-gnu-library/3.1/downscaleR/datasets/reanalysis/Iberia_NCEP/Iberia_NCEP.ncml"
The element Data
of the list is a N-dimensional array, where N is the number of dimensions depending on the type of request. In this case, the vertical dimension is dropped as it is of length one (100000 Pa), so the returned array is 3-dimensional, with the dimension names stored as an attribute named "dimensions":
str(example1.grid$Data)
## num [1:1840, 1:5, 1:8] 15.9 15.2 15.6 15.9 15.7 ...
## - attr(*, "dimensions")= chr [1:3] "time" "lat" "lon"
The dimensions are always arranged in their canonical order, this is: member > time > lat > lon
Aggregation functions can be applied directly along the selected dimension of the array using apply
. For instance, in this example we plot the mean T1000 field for the selected period, just applying the mean
function along the lon
and lat
dimensions. (Note that this example uses the image.plot
and world
functions from package fields
).
require(fields)
meanField2D <- apply(example1.grid$Data, MARGIN = c(3,2), FUN = mean)
image.plot(example1.grid$xyCoords$x, example1.grid$xyCoords$y, meanField2D, asp = 1)
world(add = TRUE)
However, this typical plotting operation has been included as a function called plotMeanField
in downscaleR
, so the above figure can be directly obtained using:
plotMeanField(example1.grid)
An object formed by two or more variables sharing a common grid and time span is termed a multifield. A multifield can be directly loaded using the loadMultiField
function. In the following example, we load typical predictors in downscaling applications (mean surface temperature, sea-level pressure, specific humidity at 850mb and geopotential heigth at 500mb). By default, the function will preserve the original grid of the first input field, and then will ensure that the others are coincident with it. In this example, we explicitly provide a new grid on which the multifield will be bilinearly interpolated):
example.mf <- loadMultiField(dataset = ncep, var = c("tas", "psl", "hus@85000", "z@50000"), lonLim = c(-12, 5), latLim=c(35,45), season=c(6:8), years = 1981:2000, new.grid = list(x = c(-12,5,.5), y = c(35,45,.5)), interp.method = "bilinear")
## [2014-09-01 20:02:14] Loading predictor 1 (tas) out of 4
## [2014-09-01 20:02:15] Loading predictor 2 (psl) out of 4
## [2014-09-01 20:02:15] Loading predictor 3 (hus@85000) out of 4
## [2014-09-01 20:02:16] Loading predictor 4 (z@50000) out of 4
## [2014-09-01 20:02:17] Regridding variable 1 (tas) out of 4 ...
## [2014-09-01 20:02:37] Regridding variable 2 (psl) out of 4 ...
## [2014-09-01 20:02:56] Regridding variable 3 (hus@85000) out of 4 ...
## [2014-09-01 20:03:16] Regridding variable 4 (z@50000) out of 4 ...
## [2014-09-01 20:03:36] Done.
Note the behaviour of plotMeanField
when a multifield is given as input:
Also note the way the new grid is specified, with components x and y within a list, consisting of a vector of the form c(start, end, res), where res is the spatial resolution of the grid cell in that dimension.
The data structure of a multifield incorporates a new dimension in the Data
array, labeled var
, along which the different variables composing the multifield are stacked:
str(example.mf$Data)
## num [1:4, 1:1840, 1:21, 1:35] 1.84e+01 1.02e+05 3.39e-03 5.76e+03 1.82e+01 ...
## - attr(*, "dimensions")= chr [1:4] "var" "time" "lat" "lon"
Alternatively, the constructor makeMultiField
allows creating a multifield from a collection of fields. In this case, the spatial and temporal consistency of the different fields must be checked by the user, which can resort to the interpGridData
and getGrid
methods to this aim. Otherwise, the function will stop raising an error.
# Definition of a common domain
lon = c(-12, 5)
lat = c(35,45)
# Definition of common time domain
seas = c(12,1,2)
yrs = 1990:1999
# Surface temperature
tas <- loadGridData(dataset = ncep, var = "tas", lonLim = lon, latLim = lat, season = seas, years = yrs)
## [2014-09-01 20:24:11] Defining homogeneization parameters for variable "tas"
## [2014-09-01 20:24:12] Defining geo-location parameters
## [2014-09-01 20:24:12] Defining time selection parameters
## [2014-09-01 20:24:12] Retrieving data subset ...
## [2014-09-01 20:24:12] Done
# Sea-level pressure
psl <- loadGridData(dataset = ncep, var = "psl", lonLim = lon, latLim = lat, season = seas, years = yrs)
## [2014-09-01 20:24:12] Defining homogeneization parameters for variable "psl"
## [2014-09-01 20:24:12] Defining geo-location parameters
## [2014-09-01 20:24:12] Defining time selection parameters
## [2014-09-01 20:24:12] Retrieving data subset ...
## [2014-09-01 20:24:13] Done
# Geopotential heigth at 850mb
# We introduce a different spatial domain here!
z850 <- loadGridData(dataset = ncep, var = "z@85000", lonLim = c(-15,7), latLim = lat, season = seas, years = yrs)
## [2014-09-01 20:24:13] Defining homogeneization parameters for variable "z"
## [2014-09-01 20:24:13] Defining geo-location parameters
## [2014-09-01 20:24:13] Defining time selection parameters
## [2014-09-01 20:24:13] Retrieving data subset ...
## [2014-09-01 20:24:13] Done
# ERROR: The function complains
example.mf2 <- makeMultiField(tas, psl, z850)
## Error in makeMultiField(tas, psl, z850) :
## Input data is not spatially consistent
# We need to regrid z850 to match the grid of the other fields
example.mf2 <- makeMultiField(tas, psl, interpGridData(z850, new.grid = getGrid(tas)))
## [2014-09-01 20:24:13] Performing nearest interpolation... may take a while
## [2014-09-01 20:24:14] Done
downscaleR - Santander MetGroup (Univ. Cantabria - CSIC)