-
Notifications
You must be signed in to change notification settings - Fork 59
Accessing Station Data
The function dataInventory is intended for a quick overview of the data contained in the dataset. In the case of stations data, the main argument to be provided is the path to the directory where the dataset (stations.txt, variables.txt and associated data) are stored (see this link for details on station data format).
For instance, this is a quick overview of the built-in dataset in the downscaleR
package using dataInventory
:
> di <- dataInventory("inst//datasets//observations//GSN_Iberia")
[2014-06-03 09:36:27] Doing inventory ...
[2014-06-03 09:36:27] Done.
The object loaded contains all the necessary information in order to make a call to the loading function loadObservations
, including station codes, geolocation and details on the variable names, units ... :
> str(di)
List of 3
$ Stations :List of 4
..$ station_id : chr [1:6] "SP000008027" "SP000008181" "SP000008202" "SP000008215" ...
..$ LonLatCoords : num [1:6, 1:2] -2.04 2.07 -5.5 -4.01 -1.86 ...
.. ..- attr(*, "dimnames")=List of 2
.. .. ..$ : chr [1:6] "SP000008027" "SP000008181" "SP000008202" "SP000008215" ...
.. .. ..$ : chr [1:2] "lon" "lat"
..$ times :List of 3
.. ..$ startDate: POSIXlt[1:1], format: "1979-01-01"
.. ..$ endDate : POSIXlt[1:1], format: "2012-12-31"
.. ..$ timeStep :Class 'difftime' atomic [1:1] 24
.. .. .. ..- attr(*, "units")= chr "hours"
..$ other.metadata:List of 4
.. ..$ altitude : int [1:6] 251 4 790 1894 704 90
.. ..$ location : chr [1:6] "SAN SEBASTIAN - IGUELDO" "BARCELONA/AEROPUERTO" "SALAMANCA AEROPUERTO" "NAVACERRADA" ...
.. ..$ WMO_Id : int [1:6] 8027 8181 8202 8215 8280 8410
.. ..$ Koppen.class: chr [1:6] "Cfb" "Csa" "BSk" "Csb" ...
$ Variables :'data.frame': 3 obs. of 4 variables:
..$ variable : Factor w/ 3 levels "precip","tmax",..: 1 3 2
..$ longname : Factor w/ 3 levels "maximum daily temperature",..: 3 2 1
..$ unit : Factor w/ 2 levels "0.1 degC","0.1 mm": 2 1 1
..$ missing.code: Factor w/ 1 level "NaN": 1 1 1
$ Summary.stats: NULL
Note that the last element of the inventory, named {{{Summary.stats}}} is NULL. Bt default, the inventory will return the basic information, but setting the argument {{{return.stats}}} to TRUE will return also a table summarizing the characteristics of the data (percentage of missing data, mean, min and max values):
> di2 <- dataInventory("inst//datasets//observations//GSN_Iberia", return.stats= TRUE)
[2014-06-03 09:51:18] Doing inventory ...
[2014-06-03 09:51:19] Done.
> di2$Summary.stats
$missing.percent
precip tmin tmax
SP000008027 0.6 2.3 4.2
SP000008181 0.7 1.7 1.1
SP000008202 0.5 4.5 0.8
SP000008215 0.6 2.7 2.3
SP000008280 0.5 17.8 4.1
SP000008410 0.9 10.1 6.0
$min
precip tmin tmax
SP000008027 -0.3 -10.0 -3.5
SP000008181 0.0 -7.2 0.0
SP000008202 0.0 -12.0 -1.4
SP000008215 0.0 -17.5 -11.0
SP000008280 0.0 -13.4 -1.8
SP000008410 0.0 -8.2 0.0
$max
precip tmin tmax
SP000008027 93.0 25.2 38.6
SP000008181 175.1 26.8 37.4
SP000008202 50.3 22.0 41.0
SP000008215 111.8 20.6 31.8
SP000008280 146.6 23.4 42.0
SP000008410 154.3 27.0 46.6
$mean
precip tmin tmax
SP000008027 1.2266499 10.621573 16.58673
SP000008181 1.5895889 11.803554 20.44687
SP000008202 1.0108198 5.791200 18.73546
SP000008215 1.5650255 3.303088 10.94361
SP000008280 0.9700607 7.756689 20.23754
SP000008410 1.5544916 11.311057 24.63882
The function loadObservations
is the interface to acces observational datasets. There are several ways in which observations data can be queried. The most common cases are next presented.
Given the station codes provided by the inventory, it is possible to retrieve a time series for a selected station or several time series for several stations directly by the identification codes. This will load summer temperature data (JJA) for the period 1981-2000 for two stations: Albacete - Los Llanos and Cordoba - Aeropuerto:
> example1 <- loadObservations(source.dir="inst//datasets//observations//GSN_Iberia", var="tmax", stationID = c("SP000008280", "SP000008410"), season = 6:8, years = 1981:2000)
[2014-06-03 10:13:30] Loading data ...
[2014-06-03 10:13:30] Retrieving metadata ...
[2014-06-03 10:13:30] Done.
> str(example1)
List of 6
$ variable : chr "tmax"
$ station_id : chr [1:2] "SP000008280" "SP000008410"
$ LonLatCoords: num [1:2, 1:2] -1.86 -4.85 38.95 37.84
..- attr(*, "dimnames")=List of 2
.. ..$ : chr [1:2] "SP000008280" "SP000008410"
.. ..$ : chr [1:2] "longitude" "latitude"
$ time :List of 2
..$ Start: POSIXlt[1:1840], format: "1981-06-01" "1981-06-02" "1981-06-03" "1981-06-04" ...
..$ End : POSIXlt[1:1840], format: "1981-06-02" "1981-06-03" "1981-06-04" "1981-06-05" ...
$ metadata :List of 4
..$ altitude : int [1:2] 704 90
..$ location : chr [1:2] "ALBACETE LOS LLANOS" "CORDOBA AEROPUERTO"
..$ WMO_Id : int [1:2] 8280 8410
..$ Koppen.class: chr [1:2] "BSk" "Csa"
$ Data :'data.frame': 1840 obs. of 2 variables:
..$ SP000008280: num [1:1840] 27.4 26.6 23.2 26.4 30.2 33.6 34.6 35.6 35 32.4 ...
..$ SP000008410: num [1:1840] 26.8 26.8 26.4 31 33.6 35.6 37.4 37 36.6 39.6 ...
This is an example plot of the time series returned. Note that time is defined by lower and upper time bounds, rather than one single verification date:
> plot(example1$time$Start, example1$Data$SP000008410, ty = 'l', col = "blue")
> lines(example1$time$Start, example1$Data$SP000008280, ty = 'l', col = "red")
Alternatively, we can choose a location by its coordinates. From the dataset inventory, we know the geographical coordinates of the Albacete - Los Llanos station (-1.8631E, 38.9519N):
> di$Stations$LonLatCoords
lon lat
SP000008027 -2.0392 43.3075
SP000008181 2.0697 41.2928
SP000008202 -5.4981 40.9592
SP000008215 -4.0103 40.7806
SP000008280 -1.8631 38.9519
SP000008410 -4.8458 37.8442
We can introduce these coordinates in the lonLim
and latLim
arguments. Note that it is not necessary to specify all the decimals, as the function will take care of finding the closest station to the given coordinate:
> example2 <- loadObservations(source.dir="inst//datasets//observations//GSN_Iberia", var="tmax", lonLim = -1.9, latLim = 39, season = 6:8, years = 1981:2000)
[2014-06-03 10:36:26] Closest station located at 0.0606 spatial units from the specified [lonLim,latLim] coordinate
[2014-06-03 10:36:26] Loading data ...
[2014-06-03 10:36:26] Retrieving metadata ...
[2014-06-03 10:36:26] Done.
> str(example2)
List of 6
$ variable : chr "tmax"
$ station_id : chr "SP000008280"
$ LonLatCoords: num [1, 1:2] -1.86 38.95
..- attr(*, "dimnames")=List of 2
.. ..$ : chr "SP000008280"
.. ..$ : chr [1:2] "longitude" "latitude"
$ time :List of 2
..$ Start: POSIXlt[1:1840], format: "1981-06-01" "1981-06-02" "1981-06-03" "1981-06-04" ...
..$ End : POSIXlt[1:1840], format: "1981-06-02" "1981-06-03" "1981-06-04" "1981-06-05" ...
$ metadata :List of 4
..$ altitude : int 704
..$ location : chr "ALBACETE LOS LLANOS"
..$ WMO_Id : int 8280
..$ Koppen.class: chr "BSk"
$ Data :'data.frame': 1840 obs. of 1 variable:
..$ SP000008280: num [1:1840] 27.4 26.6 23.2 26.4 30.2 33.6 34.6 35.6 35 32.4 ...
A particular case of selection by coordinates is when all data within a given bounding box is desired. In this case, the lonLim
and latLim
arguments are filled with a vector of length two, defining the corners of the bounding box. For instance:
> example3 <- loadObservations(source.dir="inst//datasets//observations//GSN_Iberia", var="tmax", lonLim = c(-5,5), latLim = c(37,40), season = 6:8, years = 1981:2000)
[2014-06-03 10:43:59] Loading data ...
[2014-06-03 10:44:00] Retrieving metadata ...
[2014-06-03 10:44:00] Done.
> str(example3)
List of 6
$ variable : chr "tmax"
$ station_id : chr [1:2] "SP000008280" "SP000008410"
$ LonLatCoords: num [1:2, 1:2] -1.86 -4.85 38.95 37.84
..- attr(*, "dimnames")=List of 2
.. ..$ : chr [1:2] "SP000008280" "SP000008410"
.. ..$ : chr [1:2] "longitude" "latitude"
$ time :List of 2
..$ Start: POSIXlt[1:1840], format: "1981-06-01" "1981-06-02" "1981-06-03" "1981-06-04" ...
..$ End : POSIXlt[1:1840], format: "1981-06-02" "1981-06-03" "1981-06-04" "1981-06-05" ...
$ metadata :List of 4
..$ altitude : int [1:2] 704 90
..$ location : chr [1:2] "ALBACETE LOS LLANOS" "CORDOBA AEROPUERTO"
..$ WMO_Id : int [1:2] 8280 8410
..$ Koppen.class: chr [1:2] "BSk" "Csa"
$ Data :'data.frame': 1840 obs. of 2 variables:
..$ SP000008280: num [1:1840] 27.4 26.6 23.2 26.4 30.2 33.6 34.6 35.6 35 32.4 ...
..$ SP000008410: num [1:1840] 26.8 26.8 26.4 31 33.6 35.6 37.4 37 36.6 39.6 ...
downscaleR - Santander MetGroup (Univ. Cantabria - CSIC)