Skip to content

Training and cross validation

miturbide edited this page Jun 20, 2018 · 5 revisions

1. Introduction: Predictors and predictand

As we mentioned in the data transformation section, transformeR is an auxiliar package implementing a number of transformation and post-processing functionalities. Moreover, it provides a number of illustrative datasets (observations, reanalysis, climate change projections and seasonal forecasts, all restricted to a domain covering the Iberian peninsula for DJF) which will be used in this wiki to illustrate the functionalities of downscaleR.

library(transformeR)
data(package = "transformeR")

In order to illustrate the training and cross-validation of (perfect prog) downscaling methods we use the NCEP_Iberia reanalysis data (including three common large scale predictors psl, tas850, hus850) as predictors, and the VALUE observations data for precipitation (pr) and mean temperature (tas) as predictands. Note that this dataset contains information for 11 stations in the Iberian peninsula; in some cases we will consider a single station (Igueldo - San Sebastian, Spain; ID = "000234") to graphically illustrate the results. The temporal coverage of these datasets is only winter data (DJF, 1983-2002).

These datasets were loaded with the loadeR package and preserve the data structure used in the climate4R framework, containing information about the variable (Variable), dates (Dates), coordinates (xyCoordS) and the data (Data).

library(downscaleR)
# Selecting predictand (y) and predictor (x)
data("VALUE_Iberia_pr","VALUE_Iberia_tas")
y <- VALUE_Iberia_tas 
data("NCEP_Iberia_hus850", "NCEP_Iberia_psl", "NCEP_Iberia_ta850")

In the case of station data, there is an additional element with station metadata (Metadata, including the station names and IDs: y$Metadata).

We can use the functionalities of transformeR to build multigrids (combining different predictors) and apply several data transformations (e.g. spatial or temporal subsetting and scaling) in order to prepare the predictor dataset for downscaling:

x <- makeMultiGrid(NCEP_Iberia_hus850, NCEP_Iberia_psl, NCEP_Iberia_ta850)
x <- scaleGrid(x, type = "standardize", spatial.frame = "field") # standardizing the predictor fields

The climatology of the predictand and predictors can be easily visualized with the spatialPlot function (from the visualizeR package).

library(visualizeR)
spatialPlot(climatology(y), backdrop.theme = "countries", colorkey=T)
spatialPlot(climatology(x), backdrop.theme = "countries")

We could also use the functionalities of transformeR to select, e.g. a single station (Igueldo) for a particular period (e.g. year 2000) and plot the results.

igueldo.2000 <- subsetGrid(y,station.id = "000234",years = 2000)
temporalPlot(igueldo.2000)

2. Model Training

downscaleR allows to prepare the specific predictors to be used to train the statistical downscaling model using the function prepareData (see the corresponding Wiki section), which allows to define local (from the nearest gridboxes for each station) and/o spatial (PCs from single or combined variables) predictors. Different options are illustrated below:

data <- prepareData(x = x, y = y)
data <- prepareData(x = x, y = y, 
               local.predictors = list(n = 4, vars = getVarNames(x))) 
data <- prepareData(x = x, y = y,
               spatial.predictors = list(v.exp = 0.95))
data <- prepareData(x = x, y = y, 
               spatial.predictors = list(v.exp = 0.95),
               local.predictors = list(n = 4, vars = getVarNames(x)))

The function downscale.train trains a specific downscaling method using the prepared data. It returns both the predicted values (pred) and the model (model), which can be used to obtained downscaled values for new data using the downscale.predict function. In the following example we train a model using the method of analogs (see below for additional methods):

analog <- downscale.train(data, method = "analogs", n.analogs = 1) # The analog method

Note that the analog$pred preserves the same data structure of the predictand and, therefore, it can be transformed and visualized using the same functions. For instance, we can easily visualize the observations and predictions for the Igueldo station for a particular year (for the sake of visualization simplicity) as follows:

igueldo.2000 <- subsetGrid(y,station.id = "000234",years = 2000)
pred.2000 <- subsetGrid(analog$pred,station.id = "000234",years = 2000)
temporalPlot(igueldo.2000, pred.2000)

Example of downscaling with downscaleR

Similarly to the previous example, downscaleR allows to train models using alternative downscaling methods:

  • Linear (and generalized linear) models (method = "GLM"). The argument family allows specifying the family (e.g. family = gaussian) or both the family and the link (e.g. family = gaussian(link = "identity")) following the same format and options as the stats package glm (used to train these methods):
model <- downscale.train(data, method = "GLM",family = gaussian) # Linear regression
  • Neural network models (method = "NN"). The argument hidden allows to specify the number of hidden layers and neurons (e.g. hidden = c(10,5) for two hidden layers with 10 and 5 neurons, respectively) with activation function specified by activationfun(e.g. activationfun = "sigm" for sigmoidal); the output activation function can be specified by the argument ``output(e.g. output = "linear"` for regression or `output = "sigm"` for classification). A number ptional parameters corresponds control the learning process: `learningrate`, `momentum`, `numepochs`, etc., as given by the `deepnet` package (used to train these methods):
model <- downscale.train(data, method = "NN", hidden = c(10,5), output = "linear") 

3. Model prediction

Once the model is trained it can be used to predict the outcomes of the predictands for new (predictor) data. To this aim, the new data should include the same variables and geographical area as the predictor's definition used to train the downscaling method. The function prepareNewData allows to preprocess the new dataset indicating the new data and an existing predictor dataset (e.g. data in the current example). We could, for instance, apply the downscaling method to the training data (x) as follows:

newdata <- prepareNewData(x,data)
pred  <- downscale.predict(newdata, analog)

The result of pred would be identical to the predicted values already computed when training the model (analog$pred).

As a more realistic application, we could split the data in two subsets (train and test) and perform a simple cross-validation as follows:

xT <- subsetGrid(x, years = 1983:1999)  # training predictors
yT <- subsetGrid(y, years = 1983:1999)   # training predictands
data <- prepareData(xT,yT)       # preparing the data 
analog <- downscale.train(data, method = "analogs", n.analogs = 1)
xt <- subsetGrid(x, years = 2000)       # test predictors
newdata <- prepareNewData(xt,data)     # preparing the new predictors
pred  <- downscale.predict(newdata, analog)  # predicting 
# visualizing the results
yt <- subsetGrid(y,years = 2000)        
temporalPlot(pred,yt)             # plotting predictions and observations

4. Model cross-validation

downscaleR includes a specific function to cross-validate the downscaling methods, using a number of different alternatives (from a simple split to random or sequential k-fold). The arguments of the function are the same used in downscale.train including also the arguments of prepareData needed to preprocess the predictor data in order to prepare the training data.

analog.cv <- downscale.cv(x = x, y = y, method = "analogs", n.analogs = 1, folds = 5,
                 spatial.predictors = list(v.exp = 0.95),
                 local.predictors = list(n = 4, vars = getVarNames(x)))

<-- Home page of the Wiki

Session info

print(sessionInfo(package = c("transformeR", "downscaleR")))

## R version 3.4.3 (2017-11-30)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 14.04.5 LTS
## 
## Matrix products: default
## BLAS: /usr/lib/libblas/libblas.so.3.0
## LAPACK: /usr/lib/lapack/liblapack.so.3.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=es_ES.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=es_ES.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=es_ES.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=es_ES.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## character(0)
## 
## other attached packages:
## [1] transformeR_1.3.3 downscaleR_3.0.0 
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_0.12.15        compiler_3.4.3      highr_0.6          
##  [4] methods_3.4.3       bitops_1.0-6        iterators_1.0.8    
##  [7] utils_3.4.3         tools_3.4.3         grDevices_3.4.3    
## [10] deepnet_0.2         digest_0.6.13       dotCall64_0.9-5.2  
## [13] evd_2.3-2           gtable_0.2.0        evaluate_0.10.1    
## [16] lattice_0.20-35     Matrix_1.2-7.1      foreach_1.4.3      
## [19] yaml_2.1.16         parallel_3.4.3      spam_2.1-2         
## [22] akima_0.6-2         gridExtra_2.2.1     stringr_1.2.0      
## [25] knitr_1.18          raster_2.6-7        gridGraphics_0.2   
## [28] graphics_3.4.3      datasets_3.4.3      stats_3.4.3        
## [31] fields_9.6          maps_3.2.0          rprojroot_1.3-2    
## [34] grid_3.4.3          glmnet_2.0-13       base_3.4.3         
## [37] rmarkdown_1.8       sp_1.2-7            magrittr_1.5       
## [40] backports_1.1.2     codetools_0.2-15    htmltools_0.3.6    
## [43] MASS_7.3-44         abind_1.4-5         stringi_1.1.5      
## [46] RCurl_1.95-4.10     RcppEigen_0.3.3.3.1
Clone this wiki locally