-
Notifications
You must be signed in to change notification settings - Fork 19
/
Copy pathnetcdf_terra.Rmd
317 lines (234 loc) · 13.1 KB
/
netcdf_terra.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
---
title: "netCDF and terra"
output:
html_document:
fig_caption: no
number_sections: yes
toc: yes
toc_float: False
collapsed: no
---
```{r set-options, echo=FALSE}
options(width = 105)
knitr::opts_chunk$set(dev='png', dpi=300, cache=TRUE, out.width = "75%", out.height = "75%")
pdf.options(useDingbats = TRUE)
klippy::klippy(position = c('top', 'right'))
```
# Introduction
There are three semi-related R packages for visualizing and analyzing spatial and temporal data:
- `sf` (Simple Features for R [[https://r-spatial.github.io/sf/]](https://r-spatial.github.io/sf/)), which might be though of as main vehicle for managing the diverse kinds of geospatial data, automaticaly linking to the main packages for geospatial-type analyses (`GEOS`, `s2geometry`, `PROJ`, and `GDAL`);
- `stars` (Spatiotemporal Arrays [[https://r-spatial.github.io/stars/index.html]](https://r-spatial.github.io/stars/index.html)) with supports the analysis of spatiotemporal data; and
- `terra` ([[https://rspatial.org/index.html]](https://rspatial.org/index.html)) which focuses mainly on raster data (such as satellite remote-sensing data), but can also manage spatial vector data.
Each of these packages has functions for reading and writing data sets in a number of formats (like netCDF or HDF), and for storing and manipulating the data internally. Because all nicely support netCDF, and netCDF has its own functions for reading or writing arrays, it can be viewed as the glue that links the packages, both to one-another as will as to the "outside" world. Here's a summary of some of the packages and functions involved
```
read netCDF write netCDF
base R ncdf4, RNetCDF packages ncdf4, RNetCDF packages
sf st_read() st_write()
terra rast() writeCDF()
stars read_netcdf() write_stars()
```
# The terra package
The `terra` package is large library of functions and methods for dealing with raster data sets, including many geospatial formats, including netCDF. It is a rapidly deloping package, replacing the old `raster` package. The functions support many different kinds of geospatial procedures applied to raster data, like regridding and interpolation, hillslope shading calculations, logical and mathematical manipulations and so on, but it also can efficiently read and manipulate large data sets. In particular `terra` is designed to only read into memory those parts of a data set that are necessary for some analysis, raising the possibility of analyzing data sets that are much larger than the availble machine memory. In addition, some of the "flatting" and "reshaping" maneuvers that are required to get netCDF data sets into "tidy" formats are supported by individual functions. Below are some simple examples of reading and writing netCDF data sets.
To illustrate the `terra` approach to reading and displaying the CRU temperature data, first load the required packages:
```{r start, message=FALSE}
# load packages
library(sf)
library(ncdf4)
library(terra)
library(RColorBrewer)
```
Also read a world outline shape file for plotting:
```{r readWorldShape, cache = TRUE}
# read a shape file
shp_file <- "/Users/bartlein/Projects/RESS/data/shp_files/world2015/UIA_World_Countries_Boundaries.shp"
world_outline <- as(st_geometry(read_sf(shp_file)), Class = "Spatial")
plot(world_outline, col="lightgreen", lwd=1)
```
Convert the world_outline to a spatial vector (`SpatVector`), one of the classes of objects familiar to the `terra` package (the other being `SpatRaster`).
```{r convertoutline, cache = TRUE}
# convert world_outline to a spatial vector
world_otl <- vect(world_outline)
class(world_otl)
```
## Read a netCDF file
Next, set the path, filename, and variable name for the CRU long-term mean temperature data
```{r setpath}
# set path and filename
nc_path <- "/Users/bartlein/Projects/RESS/data/nc_files/"
nc_name <- "cru10min30_tmp"
nc_fname <- paste(nc_path, nc_name, ".nc", sep="")
dname <- "tmp" # note: tmp means temperature (not temporary)
```
Read the 3-D array from the netCDF file using the `rast()` function:
```{r readCRU, cache=TRUE}
# read the netCDF file
tmp_raster <- rast(nc_fname)
tmp_raster
class(tmp_raster)
```
Typing the name of the object `tmp_raster` produces a short summary of the contents of the file. Note that the class of the object just created is `SpatRaster` as opposed to `array`.
Next, plot the data, using the `terra` package default plotting function.
```{r plotCRU}
plot(tmp_raster)
```
Plot just the January data, using the `subset()` function to get the data:
```{r plot jan}
# select just the January data
tjan_raster <- subset(tmp_raster, s = 1)
plot(tjan_raster)
```
Here's a nicer plot:
```{r plotjan}
# plot January temperature
breaks <- c(-50,-40,-30,-20,-10,0,10,20,30,40,50)
color_fun <- colorRampPalette(rev(brewer.pal(10,"RdBu")))
plot(tjan_raster, col = color_fun(10), type = "continuous", breaks = breaks,
main = "January Temperature", ylab = "Latitude", xlab = "Longitude")
plot(world_otl, add = TRUE)
```
So it looks like the `raster` package can read a netCDF file with fewer lines of code than `ncdf`.
## "Flatting" a raster brick
The `values()` function in `terra` reshapes a raster object; if the argument of the function is a raster layer, the function returns a vector, while if the argument is a raster stack or raster brick (e.g. a 3-D array), the function returns a matrix, with each row representing an individual cell (or location in the grid), and the columns representing layers (which in the case of the CRU data are times (months)). The replicates the reshaping described earlier for netCDF files.
```{r values}
tmp_array <- values(tmp_raster)
class(tmp_array); dim(tmp_array)
```
The class and diminsions of `tmp_array` describe the result of the reshaping -- the nlon x nlat x 12 brick is now a rectangular data set.
```{r printarray}
head(na.omit(tmp_array))
```
# NetCDF and the terra package #
The `terra` package evidently has the capability of reading and writing netCDF files. There are several issues that could arise in such transformations (i.e. from the netCDF format to the `terra` `SpatRaster` format) related to such things as the indexing of grid-cell locations: netCDF coordinates refer to the center of grid cells, while `terra` coordinates refer to cell corners.
In practice, the `terra` package seems to "do the right thing" in reading and writing netCDF, as can be demonstrated using a "toy" data set. In the examples below, a simple netCDF data set will be generated and written out using the `ncdf4` package. Then that netCDF data set will be read in as a raster "layer" and plotted, and finally the raster layer will be written again as a netCDF file. As can be seen, the coordiate systems are appropriately adjusted going back and forth between netCDF and the `raster` "native" format.
## Generate and write a simple netCDF file ##
Generate a small `nlon` = 8, `nlat` = 4 2-d netCDF data set, filled with normally distributed random numbers
```{r ncrast01}
# create a small netCDF file
# generate lons and lats
nlon <- 8; nlat <- 4
dlon <- 360.0/nlon; dlat <- 180.0/nlat
lon <- seq(-180.0+(dlon/2),+180.0-(dlon/2),by=dlon)
lon
lat <- seq(-90.0+(dlat/2),+90.0-(dlat/2),by=dlat)
lat
# define dimensions
londim <- ncdim_def("lon", "degrees_east", as.double(lon))
latdim <- ncdim_def("lat", "degrees_north", as.double(lat))
# generate some data
set.seed(5) # to generate the same stream of random numbers each time the code is run
tmp <- rnorm(nlon*nlat)
tmat <- array(tmp, dim = c(nlon, nlat))
dim(tmat)
class(tmat)
```
The date generated by the `rnorm()` function are "Z-scores" -- normally distributed random numbers with a mean of 0.0 and a standard deviation of 1.0. Note the class of the generated data (`tmat`). Continue building the netCDF data set.
```{r}
# define variables
varname="tmp"
units="z-scores"
dlname <- "test variable -- original"
fillvalue <- 1e20
tmp.def <- ncvar_def(varname, units, list(londim, latdim), fillvalue,
dlname, prec = "single")
```
As can be seen, the longitudes run from -157.5 to +157.5, while the latitudes run from -67.5 to +67.5, and they define the centers of the netCDF grid cells.
Here's what the data looks like:
```{r printdata}
# print the data
print(t(tmat)[(nlat:1),])
```
The `t()` function transposes the matrix by flipping it along the principal diagonal, while the selection `[(nlat:1),]` flips the data top to bottom, so it will be consistent with the maps below. Notice the two extreme values, `-2.18` (second row, second column) and `1.71` (fourth row, fifth column); these will help in the comparison of the maps later.
Convert `tmat` to a `terra` `SpatRast` for plotting. Also add the spatial extent of the raster, which extends from the cell edges, as opposed to the cell centers
```{r rasttmat}
tmat_sr <- rast(tmat)
tmat_sr <- t(tmat_sr) # transpose
tmat_sr <- flip(tmat_sr, direction = "vertical")
ext(tmat_sr) <- c(-180.0, 180.0, -90.0, 90.0)
```
Here's what the generated data look like:
```{r plotmat}
# plot tmpin, the test raster read back in
breaks <- c(-2.5, -2.0, -1.5, -1.0, -0.5, 0.0, 0.5, 1.0, 1.5, 2.0, 2.5)
color_fun <- colorRampPalette(rev(brewer.pal(10,"RdBu")))
plot(tmat_sr, col = color_fun(10), type = "continuous", breaks = breaks,
main = "Test Variable", ylab = "Latitude", xlab = "Longitude")
plot(world_otl, add = TRUE)
```
Write the generated data to a netCDF file using `ncdf4` functions.
```{r ncrast02}
# create a netCDF file
ncfname <- "test-netCDF-file.nc"
ncout <- nc_create(ncfname, list(tmp.def), force_v4 = TRUE)
# put the array
ncvar_put(ncout, tmp.def, tmat)
# put additional attributes into dimension and data variables
ncatt_put(ncout, "lon", "axis", "X")
ncatt_put(ncout, "lat", "axis", "Y")
# add global attributes
title <- "small example netCDF file -- written using ncdf4"
ncatt_put(ncout, 0, "title", title)
# close the file, writing data to disk
nc_close(ncout)
```
Here's what the netCDF file looks like, as plotted in Panoply:
<img src="images/tmp_in_test_netCDF_file.png" alt="Drawing" style="width: 500px;"/>
The `ncdump` command-line utility can be used to verify the contents of the file. At the Console in RStudio, the following code would do that:
```{r ncdump1b, echo=TRUE, eval=FALSE}
cat(system("ncdump -c test-netCDF-file.nc"))
```
## Read the simple netCDF file back in as a raster layer ##
Now read the newly created netCDF data set back in as a `SpatRast` object. (Load the packages if not already loaded.)
```{r ncrast03, message=FALSE, cache=FALSE}
# load packages
library(sf)
library(ncdf4)
library(terra)
library(RColorBrewer)
```
```{r ncrast04}
# read the netCDF file as a SpatRaster
tmpin <- rast(ncfname)
tmpin
class(tmpin)
```
Listing the object as above, provides its internal `terra` attributes, while the `print()` function provides the characteristics of the source netCDF file.
```{r ncrast05}
# source file characteristics
print(tmpin)
```
Here's what the generated data look like after being read back in:
```{r plotmat2}
# plot tmpin, the test raster read back in
breaks <- c(-2.5, -2.0, -1.5, -1.0, -0.5, 0.0, 0.5, 1.0, 1.5, 2.0, 2.5)
color_fun <- colorRampPalette(rev(brewer.pal(10,"RdBu")))
plot(tmpin, col = color_fun(10), type = "continuous", breaks = breaks,
main = "Test Variable -- as Raster Layer", ylab = "Latitude", xlab = "Longitude")
plot(world_otl, add = TRUE)
```
## Check the coordinates of the raster
The spatial extent of the `tmpin` raster runs from -180.0 to + 180.0 and -90.0 to + 90.0, but as originally generated, the the grid cells in the netCDF file ranged from -157.5 to + 157.5 and -67.5 to + 67.5. Was information on the layout of the grid lost while writing out or reading in the netCDF file? The actual coordinate values of the cell centers can be gotten as follows:
```{r get raster coords}
# get raster coordinates
xFromCol(tmpin)
yFromRow(tmpin)
```
These can be compared with the original `lon' and 'lat' values defined above:
```{r list lonlat}
# original lon and lat
lon
-1.0 * lat
```
(Note that the latitudes in the netCDF file were defined to run from negative to positive values, while the `terra` `yFromRow()` function reports them in descending order. Multiplying `lon` reverses the order for comparability.) So, no information on the layout of the grid was lost.
## Write the raster layer as a netCDF file ##
Write the generated data (as `tmat_sr`, a `SpatRaster` object with coordinates/spatial extent data) to a netCDF file, this time using `writeCDF()` from the `terra` package:
```{r ncrast0x, fig.width=6, fig.height=4}
# create a netCDF file
ncfname <- "test-netCDF-terra.nc"
writeCDF(tmat_sr, ncfname, overwrite = TRUE, varname = "z-scores", unit = "z", longname = "random values")
```
Here's what that netCDF file looks like, as plotted in Panoply:
<img src="images/tmp_in_test-netCDF-terra.png" alt="Drawing" style="width: 500px;"/>
The `ncdump` command-line utility can be used to verify the contents of the file. At the Console in RStudio, the following code would do that:
```{r ncdump1c, echo=TRUE, eval=FALSE}
cat(system("ncdump -c test-netCDF-terra.nc"))
```