Skip to content

Accessing Gridded Datasets

Joaquin Bedia edited this page Sep 1, 2014 · 19 revisions

Obtaining an overview of the dataset

Before opening and loading the data, it is advisable to get an overview of the variables stored, time and spatial extent, units etc. This is done by the dataInventory function, which receives as argument the path to the file storing the data (this can be either a netcdf file or a NcML). In the following examples the built-in dataset in downscaleR will be illustrated. This dataset comes from the NCEP-NCAR reanalysis encompassing the period 1961-2010 for the Iberian Peninsula domain.

> # First, the path to the built-in dataset is retrieved:
> ncep <- file.path(find.package("downscaleR"), "datasets/reanalysis/Iberia_NCEP/Iberia_NCEP.ncml")
> di <- dataInventory(ncep)
[2014-06-03 11:21:22] Doing inventory ...
[2014-06-03 11:21:24] Retrieving info for '2T' (5 vars remaining)
[2014-06-03 11:21:24] Retrieving info for 'SLP' (4 vars remaining)
[2014-06-03 11:21:24] Retrieving info for 'TP' (3 vars remaining)
[2014-06-03 11:21:24] Retrieving info for 'Q' (2 vars remaining)
[2014-06-03 11:21:24] Retrieving info for 'T' (1 vars remaining)
[2014-06-03 11:21:24] Retrieving info for 'Z' (0 vars remaining)
[2014-06-03 11:21:25] Done.

The inventory returns a list named of length n, being n the number of variables stored in the dataset named as the variables:

> names(di)
[1] "2T"  "SLP" "TP"  "Q"   "T"   "Z"

For each variable in the dataset (e.g., T), the following information is given:

> str(di$T)
List of 4
 $ Description: chr "Temperature"
 $ DataType   : chr "float"
 $ Units      : chr "K"
 $ Dimensions :List of 4
  ..$ time:List of 4
  .. ..$ Type      : chr "Time"
  .. ..$ TimeStep  : chr "1.0 days"
  .. ..$ Units     : chr "days since 1950-01-01 00:00:00"
  .. ..$ Date_range: chr "1961-01-01T00:00:00Z - 2010-12-31T00:00:00Z"
  ..$ lev :List of 3
  .. ..$ Type  : chr "Pressure"
  .. ..$ Units : chr "Pa"
  .. ..$ Values: num [1:4] 50000 70000 85000 100000
  ..$ rlat:List of 3
  .. ..$ Type  : chr "Lat"
  .. ..$ Units : chr "degrees north"
  .. ..$ Values: num [1:6] 35 37.5 40 42.5 45 47.5
  ..$ rlon:List of 3
  .. ..$ Type  : chr "Lon"
  .. ..$ Units : chr "degrees east"
  .. ..$ Values: num [1:9] -15 -12.5 -10 -7.5 -5 -2.5 0 2.5 5

So T is air temperature, and has three vertical levels (50000, 80000 and 100000 Pascals).

Loading Gridded data

In the following example we will load temperature at 1000 mb level (100000 Pa) for a selected rectangular domain centred on the Iberian Peninsula, considering summer (JJA) for the period 1981-2000. A single variable within a defined spatio-temporal domain is what we call a field. Note that the "@" symbol is used in order to specify a particular vertical level, when levels exist. A field can only contain one vertical level. Several levels would be handled using a multifield, as we shall explain in the next section.

> example1.grid <- loadGridData(dataset = ncep, var = "ta@100000", lonLim = c(-12, 5), latLim=c(35,45), season=c(6:8), years = 1981:2000)
[2014-09-01 19:37:10] Defining homogeneization parameters for variable "ta"
[2014-09-01 19:37:10] Defining geo-location parameters
[2014-09-01 19:37:10] Defining time selection parameters
[2014-09-01 19:37:10] Retrieving data subset ...
[2014-09-01 19:37:11] Done

The function returns the requested data subset and all the necessary information for the identification of the variable and its temporal and spatial definition:

> str(example1.grid)
List of 4
 $ Variable:List of 3
  ..$ varName   : chr "ta"
  ..$ isStandard: logi TRUE
  ..$ level     : num 1e+05
 $ Data    : num [1:1840, 1:5, 1:8] 15.9 15.2 15.6 15.9 15.7 ...
  ..- attr(*, "dimensions")= chr [1:3] "time" "lat" "lon"
 $ xyCoords:List of 2
  ..$ x: num [1:8] -12.5 -10 -7.5 -5 -2.5 0 2.5 5
  ..$ y: num [1:5] 35 37.5 40 42.5 45
  ..- attr(*, "projection")= chr "LatLonProjection"
 $ Dates   :List of 2
  ..$ start: chr [1:1840] "1981-06-01 00:00:00 GMT" "1981-06-02 00:00:00 GMT" "1981-06-03 00:00:00 GMT" "1981-06-04 00:00:00 GMT" ...
  ..$ end  : chr [1:1840] "1981-06-02 00:00:00 GMT" "1981-06-03 00:00:00 GMT" "1981-06-04 00:00:00 GMT" "1981-06-05 00:00:00 GMT" ...
 - attr(*, "dataset")= chr "/home/user/R/i686-pc-linux-gnu-library/3.1/downscaleR/datasets/reanalysis/Iberia_NCEP/Iberia_NCEP.ncml"

The element Data of the list is a N-dimensional array, where N is the number of dimensions depending on the type of request. In this case, the vertical dimension is dropped as it is of length one (100000 Pa), so the returned array is 3-dimensional, with the dimension names stored as an attribute named "dimensions":

> str(example1.grid$Data)
 num [1:1840, 1:5, 1:8] 15.9 15.2 15.6 15.9 15.7 ...
 - attr(*, "dimensions")= chr [1:3] "time" "lat" "lon"

The dimensions are always arranged in their canonical order, this is: member > time > lat > lon

Aggregation functions can be applied directly along the selected dimension of the array using apply. For instance, in this example we plot the mean T1000 field for the selected period, just applying the mean function along the lon and lat dimensions. (Note that this example uses the image.plot and world functions from package fields).

> require(fields)
> meanField2D <- apply(example1.grid$Data, MARGIN = c(3,2), FUN = mean)
> image.plot(example1.grid$xyCoords$x, example1.grid$xyCoords$y, meanField2D, asp = 1)
> world(add = TRUE)

However, this typical plotting operation has been included as a function called plotMeanField in downscaleR, so the above figure can be directly obtained using:

> plotMeanField(example1.grid)

Dealing with several grids

Loading multifields

An object formed by two or more variables sharing a common grid and time span is termed a multifield. A multifield can be directly loaded using the loadMultiField function. In the following example, we load typical predictors in downscaling applications (mean surface temperature, sea-level pressure, specific humidity at 850mb and geopotential heigth at 500mb). By default, the function will preserve the original grid of the first input field, and then will ensure that the others are coincident with it. In this example, we explicitly provide a new grid on which the multifield will be bilinearly interpolated):

> example.mf <- loadMultiField(dataset = ncep, var = c("tas", "psl", "hus@85000", "z@50000"), lonLim = c(-12, 5), latLim=c(35,45), season=c(6:8), years = 1981:2000, new.grid = list(x = c(-12,5,.5), y = c(35,45,.5)), interp.method = "bilinear")
[2014-09-01 20:02:14] Loading predictor 1 (tas) out of 4
[2014-09-01 20:02:15] Loading predictor 2 (psl) out of 4
[2014-09-01 20:02:15] Loading predictor 3 (hus@85000) out of 4
[2014-09-01 20:02:16] Loading predictor 4 (z@50000) out of 4
[2014-09-01 20:02:17] Regridding variable 1 (tas) out of 4 ...
[2014-09-01 20:02:37] Regridding variable 2 (psl) out of 4 ...
[2014-09-01 20:02:56] Regridding variable 3 (hus@85000) out of 4 ...
[2014-09-01 20:03:16] Regridding variable 4 (z@50000) out of 4 ...
[2014-09-01 20:03:36] Done.

Note the behaviour of plotMeanField when a multifield is given as input:

Also note the way the new grid is specified, with components x and y within a list, consisting of a vector of the form c(start, end, res), where res is the spatial resolution of the grid cell in that dimension.

Creating a multifield from existing fields

Alternatively, the constructor makeMultiField allows creating a multifield from a collection of fields. In this case, the spatial and temporal consistency of the different fields must be checked by the user, which can resort to the interpGridData and getGrid methods to this aim. Otherwise, the function will stop raising an error.

> # Definition of a common domain
> lon = c(-12, 5)
> lat = c(35,45)
> # Definition of common time domain
> seas = c(12,1,2)
> yrs = 1990:1999
> # Surface temperature
> tas <- loadGridData(dataset = ncep, var = "tas", lonLim = lon, latLim = lat, season = seas, years = yrs)
[2014-09-01 20:24:11] Defining homogeneization parameters for variable "tas"
[2014-09-01 20:24:12] Defining geo-location parameters
[2014-09-01 20:24:12] Defining time selection parameters
[2014-09-01 20:24:12] Retrieving data subset ...
[2014-09-01 20:24:12] Done
> # Sea-level pressure
> psl <- loadGridData(dataset = ncep, var = "psl", lonLim = lon, latLim = lat, season = seas, years = yrs)
[2014-09-01 20:24:12] Defining homogeneization parameters for variable "psl"
[2014-09-01 20:24:12] Defining geo-location parameters
[2014-09-01 20:24:12] Defining time selection parameters
[2014-09-01 20:24:12] Retrieving data subset ...
[2014-09-01 20:24:13] Done
> # Geopotential heigth at 850mb
> # We introduce a different spatial domain here!
> z850 <- loadGridData(dataset = ncep, var = "z@85000", lonLim = c(-15,7), latLim = lat, season = seas, years = yrs)
[2014-09-01 20:24:13] Defining homogeneization parameters for variable "z"
[2014-09-01 20:24:13] Defining geo-location parameters
[2014-09-01 20:24:13] Defining time selection parameters
[2014-09-01 20:24:13] Retrieving data subset ...
[2014-09-01 20:24:13] Done
> # ERROR: The function complains
> example.mf2 <- makeMultiField(tas, psl, z850)
Error in makeMultiField(tas, psl, z850) : 
  Input data is not spatially consistent
> # We need to regrid z850 to match the grid of the other fields
> example.mf2 <- makeMultiField(tas, psl, interpGridData(z850, new.grid = getGrid(tas)))
[2014-09-01 20:24:13] Performing nearest interpolation... may take a while
[2014-09-01 20:24:14] Done
> str(example.mf2)
List of 4
 $ Variable:List of 3
  ..$ varName   : chr [1:3] "tas" "psl" "z"
  ..$ isStandard: logi [1:3] TRUE TRUE TRUE
  ..$ level     : num [1:3] 2 NA 85000
 $ Data    : num [1:3, 1:902, 1:5, 1:8] 17.3 102060 1529 19 101640 ...
  ..- attr(*, "dimensions")= chr [1:4] "var" "time" "lat" "lon"
 $ xyCoords:List of 2
  ..$ x: num [1:8] -12.5 -10 -7.5 -5 -2.5 0 2.5 5
  ..$ y: num [1:5] 35 37.5 40 42.5 45
  ..- attr(*, "projection")= chr "LatLonProjection"
 $ Dates   :List of 3
  ..$ :List of 2
  .. ..$ start: chr [1:902] "1989-12-01 00:00:00 GMT" "1989-12-02 00:00:00 GMT" "1989-12-03 00:00:00 GMT" "1989-12-04 00:00:00 GMT" ...
  .. ..$ end  : chr [1:902] "1989-12-02 00:00:00 GMT" "1989-12-03 00:00:00 GMT" "1989-12-04 00:00:00 GMT" "1989-12-05 00:00:00 GMT" ...
  ..$ :List of 2
  .. ..$ start: chr [1:902] "1989-12-01 00:00:00 GMT" "1989-12-02 00:00:00 GMT" "1989-12-03 00:00:00 GMT" "1989-12-04 00:00:00 GMT" ...
  .. ..$ end  : chr [1:902] "1989-12-02 00:00:00 GMT" "1989-12-03 00:00:00 GMT" "1989-12-04 00:00:00 GMT" "1989-12-05 00:00:00 GMT" ...
  ..$ :List of 2
  .. ..$ start: chr [1:902] "1989-12-01 00:00:00 GMT" "1989-12-02 00:00:00 GMT" "1989-12-03 00:00:00 GMT" "1989-12-04 00:00:00 GMT" ...
  .. ..$ end  : chr [1:902] "1989-12-02 00:00:00 GMT" "1989-12-03 00:00:00 GMT" "1989-12-04 00:00:00 GMT" "1989-12-05 00:00:00 GMT" ...
 - attr(*, "dataset")= chr "/home/juaco/R/i686-pc-linux-gnu-library/3.1/downscaleR/datasets/reanalysis/Iberia_NCEP/Iberia_NCEP.ncml"
Clone this wiki locally