Skip to content

Accessing Station Data

miturbide edited this page Aug 12, 2015 · 20 revisions

Table of Contents


Obtaining a quick overview of the dataset

The function dataInventory is intended for a quick overview of the data contained in the dataset. In the case of stations data, the main argument to be provided is the path to the directory where the dataset (stations.txt, variables.txt and associated data) are stored (see this link for details on station data format).

For instance, this is a quick overview of the built-in dataset in the downscaleR package using dataInventory:

# First the path to the directory containing the data is retrieved:
gsn <- file.path(find.package("downscaleR"), "datasets/observations/GSN_Iberia")
di <- dataInventory(gsn)
## [2014-06-03 09:36:27] Doing inventory ...
## [2014-06-03 09:36:27] Done.

The object loaded contains all the necessary information in order to make a call to the loading function loadStationData, including station codes, geolocation and details on the variable names, units ... :

str(di)
## List of 3
##  $ Stations     :List of 4
##   ..$ station_id    : chr [1:6] "SP000008027" "SP000008181" "SP000008202" "SP000008215" ...
##   ..$ xyCoords  : num [1:6, 1:2] -2.04 2.07 -5.5 -4.01 -1.86 ...
##   .. ..- attr(*, "dimnames")=List of 2
##   .. .. ..$ : chr [1:6] "SP000008027" "SP000008181" "SP000008202" "SP000008215" ...
##   .. .. ..$ : chr [1:2] "lon" "lat"
##   ..$ times         :List of 3
##   .. ..$ startDate: POSIXlt[1:1], format: "1979-01-01"
##   .. ..$ endDate  : POSIXlt[1:1], format: "2012-12-31"
##   .. ..$ timeStep :Class 'difftime'  atomic [1:1] 24
##   .. .. .. ..- attr(*, "units")= chr "hours"
##   ..$ other.metadata:List of 4
##   .. ..$ altitude    : int [1:6] 251 4 790 1894 704 90
##   .. ..$ location    : chr [1:6] "SAN SEBASTIAN - IGUELDO" "BARCELONA/AEROPUERTO" "SALAMANCA AEROPUERTO" "NAVACERRADA" ...
##   .. ..$ WMO_Id      : int [1:6] 8027 8181 8202 8215 8280 8410
##   .. ..$ Koppen.class: chr [1:6] "Cfb" "Csa" "BSk" "Csb" ...
##  $ Variables    :'data.frame':	3 obs. of  4 variables:
##   ..$ variable    : Factor w/ 3 levels "precip","tmax",..: 1 3 2
##   ..$ longname    : Factor w/ 3 levels "maximum daily temperature",..: 3 2 1
##   ..$ unit        : Factor w/ 2 levels "0.1 degC","0.1 mm": 2 1 1
##   ..$ missing.code: Factor w/ 1 level "NaN": 1 1 1
##  $ Summary.stats: NULL

Note that the last element of the inventory, named Summary.stats is NULL. Bt default, the inventory will return the basic information, but setting the argument return.stats to TRUE will return also a table summarizing the characteristics of the data (percentage of missing data, mean, min and max values):

di2 <- dataInventory(gsn, return.stats= TRUE)
## [2014-06-03 09:51:18] Doing inventory ...
## [2014-06-03 09:51:19] Done.
di2$Summary.stats
## $missing.percent
##             precip tmin tmax
## SP000008027    0.6  2.3  4.2
## SP000008181    0.7  1.7  1.1
## SP000008202    0.5  4.5  0.8
## SP000008215    0.6  2.7  2.3
## SP000008280    0.5 17.8  4.1
## SP000008410    0.9 10.1  6.0
## 
## $min
##             precip  tmin  tmax
## SP000008027   -0.3 -10.0  -3.5
## SP000008181    0.0  -7.2   0.0
## SP000008202    0.0 -12.0  -1.4
## SP000008215    0.0 -17.5 -11.0
## SP000008280    0.0 -13.4  -1.8
## SP000008410    0.0  -8.2   0.0
## 
## $max
##             precip tmin tmax
## SP000008027   93.0 25.2 38.6
## SP000008181  175.1 26.8 37.4
## SP000008202   50.3 22.0 41.0
## SP000008215  111.8 20.6 31.8
## SP000008280  146.6 23.4 42.0
## SP000008410  154.3 27.0 46.6
## 
## $mean
##                precip      tmin     tmax
## SP000008027 1.2266499 10.621573 16.58673
## SP000008181 1.5895889 11.803554 20.44687
## SP000008202 1.0108198  5.791200 18.73546
## SP000008215 1.5650255  3.303088 10.94361
## SP000008280 0.9700607  7.756689 20.23754
## SP000008410 1.5544916 11.311057 24.63882

A more concise summary of the available stations can be obtained using the stationInfo command. By default, it also returns a map with the locations of the available stations, labelled by their identification codes.

print(stationInfo(gsn))
## [2014-08-30 14:26:48] Doing inventory ...
## [2014-08-30 14:26:48] Done.
##     stationID longitude latitude altitude                location WMO_Id Koppen.class
## 1 SP000008027   -2.0392  43.3075      251 SAN SEBASTIAN - IGUELDO   8027          Cfb
## 2 SP000008181    2.0697  41.2928        4    BARCELONA/AEROPUERTO   8181          Csa
## 3 SP000008202   -5.4981  40.9592      790    SALAMANCA AEROPUERTO   8202          BSk
## 4 SP000008215   -4.0103  40.7806     1894             NAVACERRADA   8215          Csb
## 5 SP000008280   -1.8631  38.9519      704     ALBACETE LOS LLANOS   8280          BSk
## 6 SP000008410   -4.8458  37.8442       90      CORDOBA AEROPUERTO   8410          Csa

station_map

Loading station data

The function loadStationData is the interface to acces observational datasets. There are several ways in which observations data can be queried. The most common cases are next presented.

Loading station data from station codes

Given the station codes provided by the inventory, it is possible to retrieve a time series for a selected station or several time series for several stations directly by the identification codes. This will load summer temperature data (JJA) for the period 1981-2000 for two stations: Albacete - Los Llanos and Cordoba - Aeropuerto:

example1 <- loadStationData(dataset = gsn, var="tmax", stationID = c("SP000008280", "SP000008410"), season = 6:8, years = 1981:2000)
## [2014-06-03 10:13:30] Loading data ...
## [2014-06-03 10:13:30] Retrieving metadata ...
## [2014-06-03 10:13:30] Done.
str(example1)
## List of 6
##  $ variable    : chr "tmax"
##  $ station_id  : chr [1:2] "SP000008280" "SP000008410"
##  $ xyCoords: num [1:2, 1:2] -1.86 -4.85 38.95 37.84
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : chr [1:2] "SP000008280" "SP000008410"
##   .. ..$ : chr [1:2] "longitude" "latitude"
##  $ time        :List of 2
##   ..$ Start: POSIXlt[1:1840], format: "1981-06-01" "1981-06-02" "1981-06-03" "1981-06-04" ...
##   ..$ End  : POSIXlt[1:1840], format: "1981-06-02" "1981-06-03" "1981-06-04" "1981-06-05" ...
##  $ metadata    :List of 4
##   ..$ altitude    : int [1:2] 704 90
##   ..$ location    : chr [1:2] "ALBACETE LOS LLANOS" "CORDOBA AEROPUERTO"
##   ..$ WMO_Id      : int [1:2] 8280 8410
##   ..$ Koppen.class: chr [1:2] "BSk" "Csa"
##  $ Data        :'data.frame':	1840 obs. of  2 variables:
##   ..$ SP000008280: num [1:1840] 27.4 26.6 23.2 26.4 30.2 33.6 34.6 35.6 35 32.4 ...
##   ..$ SP000008410: num [1:1840] 26.8 26.8 26.4 31 33.6 35.6 37.4 37 36.6 39.6 ...

Loading station data from geographical coordinates

Alternatively, we can choose a location by its coordinates. From the stationInfo output, we know the geographical coordinates of the Albacete - Los Llanos station (-1.8631E, 38.9519N). We can introduce these coordinates in the lonLim and latLim arguments. Note that it is not necessary to specify all the decimals, as the function will take care of finding the closest station to the given coordinate:

example2 <- loadStationData(dataset = gsn, var="tmax", lonLim = -1.9, latLim = 39, season = 6:8, years = 1981:2000)
## [2014-06-03 10:36:26] Closest station located at 0.0606 spatial units from the specified [lonLim,latLim] coordinate
## [2014-06-03 10:36:26] Loading data ...
## [2014-06-03 10:36:26] Retrieving metadata ...
## [2014-06-03 10:36:26] Done.
str(example2)
## List of 6
##  $ variable    : chr "tmax"
##  $ station_id  : chr "SP000008280"
##  $ xyCoords: num [1, 1:2] -1.86 38.95
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : chr "SP000008280"
##   .. ..$ : chr [1:2] "longitude" "latitude"
##  $ time        :List of 2
##   ..$ Start: POSIXlt[1:1840], format: "1981-06-01" "1981-06-02" "1981-06-03" "1981-06-04" ...
##   ..$ End  : POSIXlt[1:1840], format: "1981-06-02" "1981-06-03" "1981-06-04" "1981-06-05" ...
##  $ metadata    :List of 4
##   ..$ altitude    : int 704
##   ..$ location    : chr "ALBACETE LOS LLANOS"
##   ..$ WMO_Id      : int 8280
##   ..$ Koppen.class: chr "BSk"
##  $ Data        :'data.frame':	1840 obs. of  1 variable:
##   ..$ SP000008280: num [1:1840] 27.4 26.6 23.2 26.4 30.2 33.6 34.6 35.6 35 32.4 ...

Selection of station data within a given geographical bounding box

A particular case of selection by coordinates is when all data within a given bounding box is desired. In this case, the lonLim and latLim arguments are filled with a vector of length two, defining the corners of the bounding box. For instance:

example3 <- loadStationData(dataset = gsn, var="tmax", lonLim = c(-5,5), latLim = c(37,40), season = 6:8, years = 1981:2000)
## [2014-06-03 10:43:59] Loading data ...
## [2014-06-03 10:44:00] Retrieving metadata ...
## [2014-06-03 10:44:00] Done.
str(example3)
## List of 6
##  $ variable    : chr "tmax"
##  $ station_id  : chr [1:2] "SP000008280" "SP000008410"
##  $xyCoords: num [1:2, 1:2] -1.86 -4.85 38.95 37.84
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : chr [1:2] "SP000008280" "SP000008410"
##   .. ..$ : chr [1:2] "longitude" "latitude"
##  $ time        :List of 2
##   ..$ Start: POSIXlt[1:1840], format: "1981-06-01" "1981-06-02" "1981-06-03" "1981-06-04" ...
##   ..$ End  : POSIXlt[1:1840], format: "1981-06-02" "1981-06-03" "1981-06-04" "1981-06-05" ...
##  $ metadata    :List of 4
##   ..$ altitude    : int [1:2] 704 90
##   ..$ location    : chr [1:2] "ALBACETE LOS LLANOS" "CORDOBA AEROPUERTO"
##   ..$ WMO_Id      : int [1:2] 8280 8410
##   ..$ Koppen.class: chr [1:2] "BSk" "Csa"
##  $ Data        :'data.frame':	1840 obs. of  2 variables:
##   ..$ SP000008280: num [1:1840] 27.4 26.6 23.2 26.4 30.2 33.6 34.6 35.6 35 32.4 ...
##   ..$ SP000008410: num [1:1840] 26.8 26.8 26.4 31 33.6 35.6 37.4 37 36.6 39.6 ...

Loading all stations

By default, the arguments defining the spatial domain of the query (lonLim and latLim or stationID) are NULL. If none of them is indicated, the function will load all available stations for the time domain selected:

example4 <- loadStationData(dataset = gsn, var="tmax", season = 6:8, years = 1981:2000)
## [2014-06-03 10:47:09] Loading data ...
## [2014-06-03 10:47:09] Retrieving metadata ...
## [2014-06-03 10:47:09] Done.
str(example4)
## List of 6
##  $ variable    : chr "tmax"
##  $ station_id  : chr [1:6] "SP000008027" "SP000008181" "SP000008202" "SP000008215" ...
##  $ xyCoords: num [1:6, 1:2] -2.04 2.07 -5.5 -4.01 -1.86 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : chr [1:6] "SP000008027" "SP000008181" "SP000008202" "SP000008215" ...
##   .. ..$ : chr [1:2] "longitude" "latitude"
##  $ time        :List of 2
##   ..$ Start: POSIXlt[1:1840], format: "1981-06-01" "1981-06-02" "1981-06-03" "1981-06-04" ...
##   ..$ End  : POSIXlt[1:1840], format: "1981-06-02" "1981-06-03" "1981-06-04" "1981-06-05" ...
##  $ metadata    :List of 4
##   ..$ altitude    : int [1:6] 251 4 790 1894 704 90
##   ..$ location    : chr [1:6] "SAN SEBASTIAN - IGUELDO" "BARCELONA/AEROPUERTO" "SALAMANCA AEROPUERTO" "NAVACERRADA" ...
##   ..$ WMO_Id      : int [1:6] 8027 8181 8202 8215 8280 8410
##   ..$ Koppen.class: chr [1:6] "Cfb" "Csa" "BSk" "Csb" ...
##  $ Data        :'data.frame':	1840 obs. of  6 variables:
##   ..$ SP000008027: num [1:1840] 29 22.4 15.2 18.2 23 20 27.4 28.8 17.6 16.8 ...
##   ..$ SP000008181: num [1:1840] 23.6 23.4 26 22.2 23.4 24.4 24.8 26.8 28 27.4 ...
##   ..$ SP000008202: num [1:1840] 22.6 19 18 22.4 25.7 28.4 29 29 26.7 29.8 ...
##   ..$ SP000008215: num [1:1840] 12.6 11.8 7.4 14.6 18.2 19.4 21.6 21.4 19.8 23.2 ...
##   ..$ SP000008280: num [1:1840] 27.4 26.6 23.2 26.4 30.2 33.6 34.6 35.6 35 32.4 ...
##   ..$ SP000008410: num [1:1840] 26.8 26.8 26.4 31 33.6 35.6 37.4 37 36.6 39.6 ...

The same behaviour can be expected with the time definition of the query. For instance, when season and/or years are left to their default value NULL, all months and/or years within the dataset will be returned.

Plotting example

The next example plots the time series retrieved in the example 1. Note that time is defined by lower and upper time bounds, rather than one single verification date:

time <- as.POSIXlt(example1$time$Start)
plot(time, example1$Data$SP000008410, ty = 'l', col = "blue", xlab = "time", ylab = "T (ºC)")
lines(time, example1$Data$SP000008280, ty = 'l', col = "red")
legend("bottomright", c("Albacete", "Cordoba"), col = c("red", "blue"), lty = 1)
title("Tmax - JJA (1981-2000)")

Clone this wiki locally