-
Notifications
You must be signed in to change notification settings - Fork 59
Training and cross validation
As we mentioned in the data transformation section, transformeR is an auxiliar package implementing a number of transformation and post-processing functionalities. Moreover, it provides a number of illustrative datasets (observations, reanalysis, climate change projections and seasonal forecasts, all restricted to a domain covering the Iberian peninsula for DJF) which will be used in this wiki to illustrate the functionalities of downscaleR
.
library(transformeR)
data(package = "transformeR")
In order to illustrate the training and cross-validation of (perfect prog) downscaling methods we use the NCEP_Iberia
reanalysis data (including three common large scale predictors psl
, tas850
, hus850
) as predictors, and the VALUE observations data for precipitation (pr
) and mean temperature (tas
) as predictands. Note that this dataset contains information for 11 stations in the Iberian peninsula; in some cases we will consider a single station (Igueldo - San Sebastian, Spain; ID = "000234") to graphically illustrate the results. The temporal coverage of these datasets is only winter data (DJF, 1983-2002).
These datasets were loaded with the loadeR
package and preserve the data structure used in the climate4R
framework, containing information about the variable (Variable
), dates (Dates
), coordinates (xyCoordS
) and the data (Data
).
library(downscaleR)
# Selecting predictand (y) and predictor (x)
data("VALUE_Iberia_pr","VALUE_Iberia_tas")
y <- VALUE_Iberia_tas
data("NCEP_Iberia_hus850", "NCEP_Iberia_psl", "NCEP_Iberia_ta850")
In the case of station data, there is an additional element with station metadata (Metadata
, including the station names and IDs: y$Metadata
).
We can use the functionalities of transformeR
to build multigrids (combining different predictors) and apply several data transformations (e.g. spatial or temporal subsetting and scaling) in order to prepare the predictor dataset for downscaling:
x <- makeMultiGrid(NCEP_Iberia_hus850, NCEP_Iberia_psl, NCEP_Iberia_ta850)
x <- scaleGrid(x, type = "standardize", spatial.frame = "field") # standardizing the predictor fields
The climatology of the predictand and predictors can be easily visualized with the spatialPlot
function (from the visualizeR
package).
library(visualizeR)
spatialPlot(climatology(y), backdrop.theme = "countries", colorkey=T)
spatialPlot(climatology(x), backdrop.theme = "countries")
We could also use the functionalities of transformeR
to select, e.g. a single station (Igueldo) for a particular period (e.g. year 2000) and plot the results.
igueldo.2000 <- subsetGrid(y,station.id = "000234",years = 2000)
temporalPlot(igueldo.2000)
downscaleR
allows to prepare the specific predictors to be used to train the statistical downscaling model using the function prepareData
(see the corresponding Wiki section), which allows to define local (from the nearest gridboxes for each station) and/o spatial (PCs from single or combined variables) predictors. Different options are illustrated below:
data <- prepareData(x = x, y = y)
data <- prepareData(x = x, y = y,
local.predictors = list(n = 4, vars = getVarNames(x)))
data <- prepareData(x = x, y = y,
spatial.predictors = list(v.exp = 0.95))
data <- prepareData(x = x, y = y,
spatial.predictors = list(v.exp = 0.95),
local.predictors = list(n = 4, vars = getVarNames(x)))
The function downscale.train
trains a specific downscaling method using the prepared data. It returns both the predicted values (pred
) and the model (model
), which can be used to obtained downscaled values for new data using the downscale.predict
function. In the following example we train a model using the method of analogs (see below for additional methods):
analog <- downscale.train(data, method = "analogs", n.analogs = 1) # The analog method
Note that the analog$pred
preserves the same data structure of the predictand and, therefore, it can be transformed and visualized using the same functions. For instance, we can easily visualize the observations and predictions for the Igueldo station for a particular year (for the sake of visualization simplicity) as follows:
igueldo.2000 <- subsetGrid(y,station.id = "000234",years = 2000)
pred.2000 <- subsetGrid(analog$pred,station.id = "000234",years = 2000)
temporalPlot(igueldo.2000, pred.2000)
Similarly to the previous example, downscaleR
allows to train models using alternative downscaling methods:
-
Linear (and generalized linear) models (
method = "GLM"
). The argumentfamily
allows specifying the family (e.g.family = gaussian
) or both the family and the link (e.g.family = gaussian(link = "identity")
) following the same format and options as the stats packageglm
(used to train these methods):
model <- downscale.train(data, method = "GLM",family = gaussian) # Linear regression
-
Neural network models (
method = "NN"
). The argumenthidden
allows to specify the number of hidden layers and neurons (e.g.hidden = c(10,5)
for two hidden layers with 10 and 5 neurons, respectively) with activation function specified byactivationfun
(e.g.activationfun = "sigm"
for sigmoidal); the output activation function can be specified by the argument ``output(e.g.
output = "linear"` for regression or `output = "sigm"` for classification). A number ptional parameters corresponds control the learning process: `learningrate`, `momentum`, `numepochs`, etc., as given by the `deepnet` package (used to train these methods):
model <- downscale.train(data, method = "NN", hidden = c(10,5), output = "linear")
Once the model is trained it can be used to predict the outcomes of the predictands for new (predictor) data. To this aim, the new data should include the same variables and geographical area as the predictor's definition used to train the downscaling method. The function prepareNewData
allows to preprocess the new dataset indicating the new data and an existing predictor dataset (e.g. data
in the current example). We could, for instance, apply the downscaling method to the training data (x
) as follows:
newdata <- prepareNewData(x,data)
pred <- downscale.predict(newdata, analog)
The result of pred
would be identical to the predicted values already computed when training the model (analog$pred
).
As a more realistic application, we could split the data in two subsets (train and test) and perform a simple cross-validation as follows:
xT <- subsetGrid(x, years = 1983:1999) # training predictors
yT <- subsetGrid(y, years = 1983:1999) # training predictands
data <- prepareData(xT,yT) # preparing the data
analog <- downscale.train(data, method = "analogs", n.analogs = 1)
xt <- subsetGrid(x, years = 2000) # test predictors
newdata <- prepareNewData(xt,data) # preparing the new predictors
pred <- downscale.predict(newdata, analog) # predicting
# visualizing the results
yt <- subsetGrid(y,years = 2000)
temporalPlot(pred,yt) # plotting predictions and observations
downscaleR
includes a specific function to cross-validate the downscaling methods, using a number of different alternatives (from a simple split
to random or sequential k-fold
). The arguments of the function are the same used in downscale.train
including also the arguments of prepareData
needed to preprocess the predictor data in order to prepare the training data.
analog.cv <- downscale.cv(x = x, y = y, method = "analogs", n.analogs = 1, folds = 5,
spatial.predictors = list(v.exp = 0.95),
local.predictors = list(n = 4, vars = getVarNames(x)))
print(sessionInfo(package = c("transformeR", "downscaleR")))
## R version 3.4.3 (2017-11-30)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 14.04.5 LTS
##
## Matrix products: default
## BLAS: /usr/lib/libblas/libblas.so.3.0
## LAPACK: /usr/lib/lapack/liblapack.so.3.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=es_ES.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=es_ES.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=es_ES.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=es_ES.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## character(0)
##
## other attached packages:
## [1] transformeR_1.3.3 downscaleR_3.0.0
##
## loaded via a namespace (and not attached):
## [1] Rcpp_0.12.15 compiler_3.4.3 highr_0.6
## [4] methods_3.4.3 bitops_1.0-6 iterators_1.0.8
## [7] utils_3.4.3 tools_3.4.3 grDevices_3.4.3
## [10] deepnet_0.2 digest_0.6.13 dotCall64_0.9-5.2
## [13] evd_2.3-2 gtable_0.2.0 evaluate_0.10.1
## [16] lattice_0.20-35 Matrix_1.2-7.1 foreach_1.4.3
## [19] yaml_2.1.16 parallel_3.4.3 spam_2.1-2
## [22] akima_0.6-2 gridExtra_2.2.1 stringr_1.2.0
## [25] knitr_1.18 raster_2.6-7 gridGraphics_0.2
## [28] graphics_3.4.3 datasets_3.4.3 stats_3.4.3
## [31] fields_9.6 maps_3.2.0 rprojroot_1.3-2
## [34] grid_3.4.3 glmnet_2.0-13 base_3.4.3
## [37] rmarkdown_1.8 sp_1.2-7 magrittr_1.5
## [40] backports_1.1.2 codetools_0.2-15 htmltools_0.3.6
## [43] MASS_7.3-44 abind_1.4-5 stringi_1.1.5
## [46] RCurl_1.95-4.10 RcppEigen_0.3.3.3.1
downscaleR - Santander MetGroup (Univ. Cantabria - CSIC)