-
Notifications
You must be signed in to change notification settings - Fork 5
/
foundations.qmd
1308 lines (1017 loc) · 51 KB
/
foundations.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
# Foundations {#sec-foundations}
```{r source_r, echo = FALSE}
source("_common.R")
```
## Goals and Outcomes
* Understand fundamental spatial data structures and libraries in **R**.
* Become familiar with coordinate reference systems.
* Geographic I/O (input/output)
Throughout this section we'll use the following packages:
```{r}
#| warning: false
library(dataRetrieval)
library(tmap)
library(tmaptools)
library(dplyr)
library(tigris)
library(sf)
library(mapview)
mapviewOptions(fgb=FALSE)
library(ggplot2)
library(spData)
library(terra)
library(stars)
library(Rspatialworkshop)
library(readr)
library(cowplot)
library(knitr)
library(dplyr)
library(rnaturalearth)
library(osmdata)
library(terra)
library(stars)
```
## Background on data structures
:::{.callout-note}
Much of this background on data structures is borrowed from Mike Johnson's [Introduction to Spatial Data Science](https://mikejohnson51.github.io/spds/) and lecture material from our [AWRA 2022 Geo Workshop](https://github.com/mhweber/AWRA2022GeoWorkshop)
:::
Before we dive into spatial libraries, it's worth a quick review of relevant data structures in R - this will be a whirlwind overview, assuming most everyone is familiar with using R already.
- You, computers, and software 'understand' values in particular and different ways
- Computers convert bytes --> hex --> value
+ Humans read values
+ Software reads Hex bytes
+ Hardware reads Binary bytes
```{r, fig.align='center', echo = FALSE, out.width="75%" }
knitr::include_graphics("img/values.PNG")
```
:::{.callout-note}
What's the difference between 10 and '10'?
- **To us**: meaning
- **To software**: how it is handled
- **To a computer**: nothing
We need human-defined (computer-guessable) data types
:::
Fundamental data types
```{r}
# Numeric
typeof(1.9)
# Integer
typeof(1L)
# Boolean
typeof(TRUE)
# Character
typeof("Welcome")
```
Storing more than one value requires a structure.
Values can be structured in ways such as:
- vectors
- dimensions/attributes
- data.frames
And data.frames can be operated on with functions such as:
- filter
- select
- mutate
- summarize
- group_by
### Vectors
:::{.callout-note}
'vector' has two meanings when working with GIS data and working in R!
- geographic vector data (which we'll explore)
- `vector` class (what we're talking about here)
geographic vector data is a *data model*, and the `vector` class is an R class like `data.frame` and `matrix`
:::
**Vectors** come in two flavors:
1. atomic
2. list
- atomic vectors elements must have the same **type**
- lists elements can have different **types**
### Atomic vectors
```{r}
# Numeric
dbl_vec = c(1.9, 2, 3.5)
typeof(dbl_vec)
length(dbl_vec)
# Logical
lg_vec = c(TRUE, FALSE, F, T)
typeof(lg_vec)
length(lg_vec)
```
**Coercion**
- *type* is a property of a vector
- When you try to combine different *types* they'll be *coerced* in the following fixed order:
+ character => double => integer => logical
- Coercion occurs automatically but generates a warning and a missing value when it fails
```{r}
c("a", 1)
c("a", TRUE)
c(4.5, 1L)
c("1", 18, "GIS")
as.numeric(c("1", 18, "GIS"))
as.logical(c("1", 18, "GIS"))
```
*Subsetting atomic vectors*
```{r}
# Atomic numeric vector
(x = c(3.4, 7, 18, 9.6))
# Third Value
x[3]
# Third and Fourth value
x[c(3,4)]
# Drop the third value
x[-3]
# Keep the 1 and 2 value, but drop 3 and 4
x[c(T,T,F,F)]
```
### Matrix
- A matrix is 2D atomic (row, column)
+ Same data types
+ Same column length
This is how spatial *raster* data is structured
Subsetting matrices uses row,column (i,j) syntax
```{r}
(x = matrix(1:9, nrow = 3))
x[3,]
x[,3]
x[3,3]
```
### Array
- Array is a 3d Atomic (row, column, slice)
This is how spatial *raster* data with a time dimension is structured
```{r}
(array(c(1:12), dim = c(3,2,2)))
```
Subsetting arrays uses row, column, slice syntax (i,j,z)
```{r}
(x = array(1:12, dim = c(2,2,3)))
x[1,,]
x[,1,]
x[,,1]
```
### Lists
Eash list element can be any data type
```{r}
(my_list <- list(
matrix(1:4, nrow = 2),
"GIS is great!",
c(TRUE, FALSE, TRUE),
c(2.3, 5.9)
))
```
```{r}
typeof(my_list)
```
Subsetting Lists
- Each element of a list can be accessed with the [[ operator
```{r}
my_list[[1]]
my_list[[1]][1,2]
```
### Data Frames
- `data.frame` is built on the `list` structure in R
- length of each atomic or list vector has to be the same
- This gives `data.frame` objects rectangular structure so they share properties of both matrices and lists
```{r}
(df1 <- data.frame(name = c("Me", "Tim", "Sarah"),
age = c(53,15,80),
retired = c(F,F,T)))
typeof(df1)
```
Subsetting a `data.frame`
```{r}
df1[1,2]
# or like a list
df1[[2]]
# or with column name operator
df1$name
```
### Data manipulation
- `data.frame` manipulation is all based on SQL queries
- R abstracts the SQL logic and provides function-ized methods
- `dplyr` in the `tidyverse` ecosystem provides the 'grammar of data manipulation` approach we'll use in this workshop
Data manipulation *verbs*:
- Primary:
+ `select()`: keeps or removes variables based on names
+ `filter()`: keeps or removes observations based on values
_ Manipulation:
+ `mutate()`: adds new variables that are functions of existing variables
+ `summarise()`: reduces multiple values down to a single summary
+ `arrange()`: changes ordering of the rows
- Grouping:
+ `group_by()`: combine with any or all of the above to perform manipulation 'by group'
### Pipe operator
- The pipe operator (native R pipe operator `|>` or `magrittr` pipe operator `%>%`) provides a more concise and expressive coding experience
- The pipe passes the object on the left hand side of the pipe into the first argument of the right hand function
- To be `|>` compatible, the data.frame is ALWAYS the fist argument to `dplyr` verbs
A demonstration using `dataRetrieval` package stream gage data from USGS:
```{r}
flows <- dataRetrieval::readNWISdv(
siteNumbers = '14187200',
parameterCd = "00060"
) |>
dataRetrieval::renameNWISColumns()
dplyr::glimpse(flows)
```
::: {.callout-note appearance="simple" icon="false"}
### Exercise
What is the `::` after the package name and before the function in the R code above doing? How would you find out more about it?
:::
::: {.callout-note appearance="simple" icon="false" collapse="true"}
### Solution
`::` specifies the namespace (i.e., package) associated with a particular function.
You can get help on it the same way you get help on anything in R:
```{r}
#| eval: false
?'::'
# or
help("::")
```
:::
### Filter
- `filter()` takes logical (binary) expressions and returns the rows in which all conditions are TRUE.
Filter on a single condition:
```{r}
flows |>
dplyr::filter(Flow > 900) |>
dplyr::glimpse()
```
Or multiple conditions:
```{r}
flows |>
dplyr::filter(Flow > 900, Date > as.Date("2010-01-01")) |>
dplyr::glimpse()
```
### Select
- Subset variables (columns) you want to keep or exclude by name
Just keep three columns
```{r}
flows |>
dplyr::select(Date, Flow) |>
dplyr::glimpse()
```
Exclude just one
```{r}
flows |>
dplyr::select(-Flow_cd) |>
dplyr::glimpse()
```
..And can rename while selecting
```{r}
flows |>
dplyr::select(Date, flow_cfs = Flow) |>
dplyr::glimpse()
```
### Mutate
- `mutate()` defines and inserts new variables into a existing `data.frame`
- `mutate()` builds new variables sequentially so you can reference earlier ones when defining later ones
We can extract Year and Month as new variables from the Date variable using date time
```{r}
flows |>
dplyr::select(Date, Flow) |>
dplyr::filter(Date > as.Date('2010-01-01')) |>
dplyr::mutate(Year = format(Date, "%Y"), Month = format(Date, "%m")) |>
dplyr::glimpse()
```
### Summarize and Group_By
- `summarize()` allows you to summarize across all observations
- `group_by()` allows you to apply any of these manipulation verbs by group in your data
```{r}
flows |>
dplyr::select(Date, Flow) |>
dplyr::mutate(Year = format(Date, "%Y")) |>
dplyr::group_by(Year) |>
dplyr::summarize(meanQ = mean(Flow),
maxQ = max(Flow))
```
:::{.callout-note}
You may click on any of the functions in this book to be directed to their respective documentation. For example, clicking on `st_join()` takes you to the documentation page for the `st_join()` function on the `sf` website.
:::
## Why R for Spatial Data Analysis?
**Advantages**
- R is:
+ lightweight
+ open-source
+ cross-platform
- Works with contributed packages - currently `r nrow(utils::available.packages(repos = "http://cran.us.r-project.org"))`
+ provides *extensibility*
- Automation and recording of workflow
+ provides *reproducibility*
- Optimized work flow - data manipulation, analysis and visualization all in one place
+ provides *integration*
- R does not alter underlying data - manipulation and visualization *in memory*
- R is great for repetitive graphics
- R is great for integrating different aspects of analysis - spatial and statistical analysis in one environment
+ again, *integration*
- Leverage statistical power of R (i.e. modeling spatial data, data visualization, statistical exploration)
- Can handle vector and raster data, as well as work with spatial databases and pretty much any data format spatial data comes in
- R's GIS capabilities growing rapidly right now - new packages added monthly - currently about 275 spatial packages (depending on how you categorize)
**Drawbacks** for usig R for GIS work
- R is not as good for interactive use as desktop GIS applications like ArcGIS or QGIS (i.e. editing features, panning, zooming, and analysis on selected subsets of features)
- Explicit coordinate system handling by the user
+ no on-the-fly projection support
- In memory analysis does not scale well with large GIS vector and tabular data
- Steep learning curve
- Up to you to find packages to do what you need - help not always great
### A Couple Motivating Examples
Here's a quick example demonstrating R’s flexibility and current strength for plotting and interactive mapping with very simple, expressive code:
```{r}
#| warning: false
# find my town!
my_town <-tmaptools::geocode_OSM("Corvallis OR USA", as.sf = TRUE)
glimpse(my_town)
```
::: {.callout-note appearance="simple" icon="false"}
### Exercise
Run and examine code chunk above and try geocoding examples you think of
1. What does `::` do?
2. What does `geocode_OSM()` do?
3. Explain how the code runs together using the `|>` chaining operator
4. What is `glimpse()`? How is it useful compared to `head()`?
:::
::: {.callout-note appearance="simple" icon="false" collapse="true"}
#### Solution
1. `::` specifies using `geocode_OSM` from the `tmaptools` package. R gives namespace preference to packages in order loaded. Some packages share function names, so when there is ambiguity, it's good practice to disambiguate your functions with the double-colon
2. `geocode_OSM()` looks up a named features in OpenStreetMap and returns the coordinates (bonus - we'll delve into more in next section - what coordinate reference system are coordinates in and how do you find out?)
3. You would translate code using the ` |> ` operator from:
- do this `|>` do that `|>` do that
To
- do this *then* do that *then* do that
4. Technically, it's a transposed version of `print()` - columns run down page, data across like rows - it additionally gives you the number of observations and variables, and the data type of each column. Note that `glimpse()` omits geometry information - you would type the name of your object at the console to get full information, or use `print.sf()` or `str()`.
- bonus: how would you quickly learn more about `glimpse()` from the console?
:::
#### Choropleth map
The `tigris` package can be used to get census boundaries such as states and counties as vector `sf` objects in R
```{r tmap example}
#| label: fig-tmapexample
#| fig-cap: "Area of Oregon Counties"
#| fig-width: 8
#| results: hide
#| message: false
#| warning: false
counties <- counties("Oregon", cb = TRUE)
counties$area <- as.numeric(st_area(counties))
glimpse(counties)
tm_shape(counties) +
tm_polygons("area", style="quantile", title="Area of Oregon Counties")
```
#### Interactive mapping
```{r}
#| label: fig-mapviewexample
#| fig-width: 8
#| results: hide
#| message: false
mapview(my_town, col="red", col.regions = "red") +
mapview(counties, alpha.regions = .1)
```
## Spatial Data Structures in R
First we'll walk through spatial data structures in R and review some key components of data structures in R that inform our use of spatial data in R.
:::{.callout-note}
A few core libraries underpin spatial libraries in R (and Python!) and in GIS software applications such as QGIS and ArcPro. Spatial data structures across languages and applications are primarily organized through [OSgeo](https://www.osgeo.org) and [OGC](https://www.ogc.org)). These core libraries include:
- [GDAL](https://gdal.org/) --\> For raster and feature abstraction and processing
- [PROJ](https://proj.org/) --\> A library for coordinate transformations and projections
- [GEOS](https://libgeos.org/) --\> A Planar geometry engine for operations (measures, relations) such as calculating buffers and centroids on data with a projected CRS
- [S2](https://s2geometry.io/) --\> a spherical geometry engine written in C++ developed by Google and adapted in R with the [**s2**](https://r-spatial.github.io/s2/) package
:::
We'll see how these core libraries are called and used in the R packages we explore throughout this workshop.
:::{.callout-note}
- **Vector** data are comprised of points, lines, and polygons that represent discrete spatial entities, such as a river, watershed, or stream gauge.
- **Raster** data divides spaces into rectilinear cells (pixels) to represent spatially continuous phenomena, such as elevation or the weather. The cell size (or resolution) defines the fidelity of the data.
:::
```{r, fig.align='center', echo = FALSE, out.width="100%"}
knitr::include_graphics("img/09-vec-raster.jpg")
```
### Vector Data Model
For *Vector* data, Simple Features (officially Simple Feature Access) is both an OGC and International Organization for Standardization (ISO) standard that specifies how (mostly) two-dimensional geometries can represent and describe objects in the real world. The Simple Features specification includes:
- a class hierarchy
- a set of operations
- binary and text encodings
It describes how such objects can be stored in and retrieved from databases, and which geometrical operations should be defined for them.
It outlines how the spatial elements of POINTS (XY locations with a specific coordinate reference system) extend to LINES, POLYGONS and GEOMETRYCOLLECTION(s).
The "simple" adjective also refers to the fact that the line or polygon geometries are represented by sequences of points connected with straight lines that do not self-intersect.
```{r}
#| label: fig-sfmodel
#| fig-cap: "The simple features data model."
#| fig-align: center
#| echo: false
#| out-width: "100%"
knitr::include_graphics("img/09-sf-model.png")
```
Here we'll load the `sf` library to load the classes and functions for working with spatial vector data and to explore how `sf` handles vector spatial data in R:
```{r}
#| warning: true
library(sf)
```
::: {.callout-note appearance="simple" icon="false"}
#### Question
What is the message you see in your console the first time you load `sf`?
Hint: If you don't see a message, you probably already have the `sf` library attached - you can uncheck it in your packages pane in RStudio, or you can run `detach("package:sf", unload = TRUE)`
:::
::: {.callout-note appearance="simple" icon="false" collapse="true"}
#### Answer
The output from `library(sf)` reports which versions of key geographic libraries such as GEOS the package is using
:::
You can report on versions of external software used by `sf` with:
```{r}
sf::sf_extSoftVersion()
```
We'll cover highlights of the functionality of `sf` but we encourage you to read the `sf` package website documentation [r-spatial.github.io/sf](r-spatial.github.io/sf) which can be viewed offline in RStudio using:
```{r}
#| eval: false
vignette(package = "sf") # see which vignettes are available
vignette("sf1") # open the introduction vignette
```
The `sfg` class in `sf` represents the different simple feature geometry types in R: point, linestring, polygon (and ‘multi’ equivalents such as multipoints) or geometry collection. Vector data in R and `sf` can be broken down into three basic `geometric` primitives with equivalent `sf` functions to create them:
1. points - `st_points()`
2. linestrings - `st_linestring()`
3. polygons - `st_polygon()`
We buld up the more complex simple feature geometries with:
- `st_multipoint()` for multipoints
- `st_multilinestring()` for multilinestrings
- `st_multipolygon()` for multipolygons
- `st_geometrycollection()` for geometry collections
Typically we don't build these up from scratch ourselves but it's important to understand these building blocks since primitives are the building blocks for all vector features in `sf`
::: {.callout-note appearance="simple" icon="true" collapse="true"}
- All functions and methods in `sf` that operate on spatial data are prefixed by `st_`, which refers to spatial type.
- This is similar to [PostGIS](http://postgis.net/)
- This makes them `sf` functions easily discoverable by command-line completion.
- Simple features are also native R data, using simple data structures (S3 classes, lists, matrix, vector).
:::
These primitives can all be broken down into set(s) of numeric x,y coordinates with a known coordinate reference system (crs).
#### Points
- A point in `sf` is composed of one coordinate pair (XY) in a specific coordinate system.
- A POINT has no length, no area and a dimension of 0.
- **z** and **m** coordinates As well as having the necessary X and Y coordinates, single point (vertex) simple features can have:
- a Z coordinate, denoting altitude, and/or
- an M value, denoting some "measure"
```{r}
# create sf object as point
pt1 <- st_point(c(1,3))
pt2 <- st_point(c(2,1))
pt3 <- st_point(c(3,2))
sf_point <- st_sfc(pt1, pt2, pt3)
# POINT defined as numeric vector
(sf::st_dimension(sf_point))
```
```{r}
ggplot(sf_point) +
geom_sf(color = "red") +
labs(x = "X", y = "Y")
```
#### Lines
- A polyline is composed of an ordered sequence of two or more POINTs
- Points in a line are called vertices/nodes and explicitly define the connection between two points.
- A LINESTRING has a length, has no area and has a dimension of 1 (length)
```{r}
# create sf object as linestring
line1 <- sf::st_linestring(rbind(pt1, pt2))
line2 <- sf::st_linestring(rbind(pt2, pt3))
sf_linestring <- st_sfc(line1, line2)
# dimension
(sf::st_dimension(sf_linestring))
```
```{r}
ggplot(sf_linestring) +
geom_sf() +
labs(x = "X", y = "Y")
```
#### Polygon
- A POLYGON is composed of 4 or more points whose starting and ending point are the same.
- A POLYGON is a surface stored as a list of its exterior and interior rings.
- A POLYGON has length, area, and a dimension of 2. (area)
```{r}
pt4 <- pt1
poly1 <- st_polygon(list(rbind(pt1, pt2, pt3, pt4)))
sf_polygon <- st_sfc(poly1)
# dimension
sf::st_dimension(sf_polygon)
```
```{r}
ggplot(sf_polygon) +
geom_sf(fill = "grey")
```
These geometries as we created above are structure using the `WKT` format. `sf uses `WKT` and `WKB` formats to store vector geometry.
#### Points, Lines, Polygons
We visualize all geometries together by running
```{r}
ggplot() +
geom_sf(data = sf_point, color = "red") +
geom_sf(data = sf_linestring) +
geom_sf(data = sf_polygon, fill = "grey")
```
::: {.callout-note}
`WKT` (Well-Known Text) and `WKB`
-Well-known text (`WKT`) encoding provide a human-readable description of the geometry.
- The well-known binary (`WKB`) encoding is machine-readable, lossless, and faster to work with than text encoding.
`WKB` is used for all interactions with GDAL and GEOS.
:::
#### Spatial Data Frames
Building up from basic points, lines and strings, simple feature objects are stored in a `data.frame`, with geometry data in a special list-column, usually named 'geom' or 'geometry' - this is the same convention used in the popular Python package [GeoPandas](https://r.geocompx.org/geopandas/geopandas/issues/2098).
Let's look at an example spatial data set that comes in the `spData` package, `us_states`:
```{r}
data("us_states")
class(us_states)
names(us_states)
```
We see that it's an `sf` data frame with both attribute columns and the special geometry column we mentioned earlier - in this case named `geometry`.
`us_states$geometry` is the ['list column'](https://adv-r.hadley.nz/vectors-chap.html#list-columns) that contains the all the coordinate information for the US state polygons in this dataset.
To reiterate, `us_states` is an `sf` simple feature collection which we can verify with:
```{r}
us_states
```
We see information on
- the geometry type
- the dimensionality
- the bounding box
- the coordinate reference system
- the header of the attributes
Remember that `sf` simple feature collections are composed of:
```{r, fig.align='center', echo = FALSE, out.width="100%",fig.cap="The simple features data sructure."}
knitr::include_graphics("img/sf_structure.png")
```
- rows of simple features (sf) that contain attributes and geometry - in green
- each of which have a simple feature geometry list-column (sfc) - in red
- which contains the underlying simple feature geometry (sfg) for each simple feature - in blue
`sf` extends the generic `plot` function so that `plot` will naturally work with `sf` objects.
```{r}
plot(us_states)
```
:::{.callout-note}
#### Exercise
Is this the result you expected from `plot()`?
Instead of making a single default map of your geometry, `plot()` in `sf` maps each variable in the dataset.
How would you plot just the geometry? Or just the two population variables? Try these variations on `plot()` with `us_states`.
Read more about plotting in `sf` by running `?plot.sf()`
:::
:::{.callout-note collapse="true"}
#### Solution
```{r}
#| eval: false
plot(us_states$geometry)
plot(us_states[,c('total_pop_10','total_pop_15')])
#or
us_states |>
dplyr::select(total_pop_10, total_pop_15) |>
plot()
```
:::
The alternative solution above highlights a **key concept** with `sf` - `sf` objects are 'tidy-compliant' and **geometry is sticky** - it carries along with objects implicitly when performing any tidy chained operations.
```{r, fig.align='center', echo = FALSE, out.width="100%",fig.cap="The simple features data data sructure." }
knitr::include_graphics("img/Sticky.PNG")
```
Keeping the geometry as part of regular `data.frame` objects in R (and in Python with `GeoPandas`!) has numerous advantages - for instance:
```{r}
summary(us_states['total_pop_15'])
```
The geometry is **sticky**, which means that it is carried along by default!
You can read `sf` objects from shapefiles using `st_read()` or `read_sf()`.
### Raster Data Model
Raster data can be *continuous* (e.g. elevation, precipitation, atmospheric deposition) or it can be *categorical* (e.g. land use, soil type, geology type). Raster data can also be image based rasters which can be single-band or multi-band. You can also have a temporal component with space-time cubes.
```{r, fig.align='center', echo = FALSE, out.width="100%",fig.cap="The raster data model."}
knitr::include_graphics("img/Rasters4.png")
```
Support for gridded data in R in recent year has been best implemented with the `raster` package by Robert Hijmans. The `raster` package allowed you to:
* read and write almost any commonly used raster data format
* perform typical raster processing operations such as resampling, projecting, filtering, raster math, etc.
* work with files on disk that are too big to read into memory in R
* run operations quickly since the package relies on back-end C code
The [terra](https://github.com/rspatial/terra) package is the replacement for the `raster` package and has now superceeded it and we will largely focus on `terra` here. Examples here draw from both [Spatial Data Science with R and terra](https://rspatial.org/terra/index.html) and [An introduction to terra in Geocomputation with R](https://geocompr.robinlovelace.net/spatial-class.html#raster-data). Use help("terra-package") in the console for full list of available `terra` functions and comparison to / changes from `raster`.
Raster representation is currently in flux a bit in R now with three choices of packages - `raster` and now `terra` which we've mentioned, as well as [stars](https://r-spatial.github.io/stars/index.html) (spatiotemporal tidy arrays with R).
To familiarize ourselves with the `terra` package, let's create an empty `SpatRaster` object - in order to do this we have to:
* define the matrix (rows and columns)
* define the spatial bounding box
:::{.callout-note}
#### Exercise
Note that typically we would be reading raster data in from a file rather than creating a raster from scratch.
Run the code steps below to familiarize yourself with constructing a `RasterLayer` from scratch.
Try providing a different bounding box for an area of your choosing or changing the dimensions and view result (by simply typing `myrast` in the console).
```{r raster-layer, message=FALSE, warning=FALSE, error=FALSE}
myrast <- rast(ncol = 10, nrow = 10, xmax = -116, xmin = -126, ymin = 42, ymax = 46)
myrast
```
:::
:::{.callout-note collapse="true"}
#### Solution
```{r}
#| eval: false
# just an example:
myrast <- rast(ncol=40, nrow = 40, xmax=-120, xmin=-136, ymin=52, ymax=56)
```
:::
`terra` uses an `S4` slot structure with a `SpatRaster` object
```{r raster-layer2, message=FALSE, warning=FALSE, error=FALSE}
str(myrast)
isS4(myrast)
```
`terra` has dedicated functions addressing each of the following components:
- `dim(my_rast)` returns the number of rows, columns and layers
- `ncell()` returns the number of cells (pixels)
- `res()` returns the spatial resolution
- `ext()` returns spatial extent
- `crs()` returns the coordinate reference system
:::{.callout-note}
#### Exercise
Exploring raster objects
1. what is the minimal data required to define a `SpatRaster`?
2. What is the CRS of our `SpatRaster`?
3. How do we pull out just the CRS for our `SpatRaster`?
4. Building on this, what is the code to pull out just our xmin value in our extent, or bounding box?
:::
:::{.callout-note collapse="true"}
#### Solution
1. number columns, number rows, and extent - `terra` will fill in defaults if values aren't provided
```{r raster-emptyexample, message=FALSE, warning=FALSE, error=FALSE}
t <- rast()
t
```
2. We didn't provide one - `terra` uses default `crs` of WGS84 if you don't provide a `crs`
3.
```{r message=FALSE, warning=FALSE, error=FALSE}
crs(myrast)
```
4.
```{r message=FALSE, warning=FALSE, error=FALSE}
ext(myrast)
# just grab xmin
ext(myrast)[1]
```
:::
#### Manipulating `terra` objects
So far we've just constructed a raster container with no values (try plotting what we have so far) - let's provide values to the cells using the `runif` function to derive random values from the uniform distribution:
```{r runif, message=FALSE, warning=FALSE, error=FALSE}
#show we have no values
hasValues(myrast)
values(myrast) <- runif(n=ncell(myrast))
myrast
```
An important point to make here is that objects in the `terra` package (and previously in `raster`) can be either in memory or on disk - note the value for our `spatRaster` r of 'source'. If this were a large raster on disk, the value would be the path to the file on disk.
```{r inmemory, message=FALSE, warning=FALSE, error=FALSE}
sources(myrast) # "" means the raster is in memory; otherwise a path to a file
hasValues(myrast)
nlyr(myrast) # we just have one layer in our object
```
`terra` also provides plot method for it's classes:
```{r plot raster, message=FALSE, warning=FALSE, error=FALSE}
plot(myrast)
```
We can also overwrite the cell values for our raster:
```{r new_values_raster, message=FALSE, warning=FALSE, error=FALSE}
values(myrast) <- 1:ncell(myrast)
values(myrast)[1:15]
```
You can access raster values via direct indexing or line, column indexing - take a minute to see how this works using raster r we just created - the syntax is:
```r
myrast[i]
myrast[line, column]
```
:::{.callout-note}
#### Exercise
How is `terra` data storage unlike a `matrix` in R? Try creating a `matrix` with same dimensions and values and compare with the `terra` raster you've created:
:::
:::{.callout-note collapse="true"}
#### Solution
```{r matrix_raster, message=FALSE, warning=FALSE, error=FALSE}
m <- matrix (1:100, nrow=10, ncol=10)
m[1,10]
myrast[1,10]
myrast[10]
```
:::
#### Reading existing rasters on disk
```{r single_raster, message=FALSE, warning=FALSE, error=FALSE}
raster_filepath = system.file("raster/srtm.tif", package = "spDataLarge")
my_rast = rast(raster_filepath)
nlyr(my_rast)
plot(my_rast)
```
### Multiband rasters
The `spatRaster` object in `terra` can hold multiple layers (similar to `RasterBrick` and `RasterStack` which were two additional classes in the `raster` package). These layers correspond to multispectral satellite imagery or a time-series raster.
```{r multiband, message=FALSE, warning=FALSE, error=FALSE}
landsat = system.file("raster/landsat.tif", package = "spDataLarge")
landsat = rast(landsat)
landsat
plot(landsat)
```
#### Raster data with `stars`
`stars` works with and stands for spatio-temporal arrays and can deal with more complex data types than either `raster` or `terra` such as rotated grids.
`stars` integrates with `sf` and many sf functions have methods for `stars` objects (i.e. `st_bbox` and `st_transform`) - this makes sense since they are both written by Edzer Pebesma. `terra` unfortunately has poor / no integration with `sf` - this is a big issue for me personally and I will likely look to `stars` long-term for my raster processing.
Basic example shown in [stars vignette](https://r-spatial.github.io/stars/articles/stars1.html) - reading in the 30m bands of a Landsat-7 image that comes with the `stars` package:
```{r stars_1, message=FALSE, warning=FALSE, error=FALSE}
tif = system.file("tif/L7_ETMs.tif", package = "stars")
ls7 = read_stars(tif)
plot(ls7, axes = TRUE)
```
ls7 (landsat7) is an object with 3 dimensions (x,y and band) and 1 attribute
```{r stars_2, message=FALSE, warning=FALSE, error=FALSE}
ls7
```
Note when we plotted above that the `plot` method with `stars` uses histogram stretching across all bands - we can also do stretching for each band individually:
```{r stars_3, message=FALSE, warning=FALSE, error=FALSE}
plot(ls7, axes = TRUE, join_zlim = FALSE)
```
`stars` uses [tidyverse methods for spatio-temporal arrays](https://r-spatial.github.io/stars/articles/stars3.html) - an example of this is pulling out one band of an image using `slice`
```{r stars_4, message=FALSE, warning=FALSE, error=FALSE}
ls7 |> dplyr::slice(band, 6) -> band6
band6
```
This gives us a lower-dimensional array of just band 6 from the Landsat7 image.
## Coordinate Reference Systems
A `CRS` is made up of several components:
- **Coordinate system**: The x,y grid that defines where your data lies in space
- **Horizontal and vertical units**: The units describing grid along the x,y and possibly z axes
- **Datum**: The modeled version of the shape of the earth
- **Projection details**: If projected, the mathematical equation used to flatten objects from round surface (earth) to flat surface (paper or screen)
```{r, fig.cap="Source: https://mgimond.github.io/Spatial/chp09_0.html", echo=FALSE, out.width="75%"}
knitr::include_graphics("img/Ellipsoid.png")
```
### The ellipsoid and geoid
The earth is a sphere, but more precisely, an *ellipsoid*, which is defined by two radii:
- semi-major axis (equatorial radius)
- semi-minor axis (polar radius)
The terms *speroid* and *ellipsoid* are used interchangeably. One particular spheroid is distinguished from another by the lengths of the semimajor and semiminor axes. Both GRS80 and WGS84 spheroids use
semimajor and semiminor axes of 6,378,137 meters and 6,356,752 meters respectively (the semiminor axis differs by thousandths of meters between the two). You'll encounter older spheroid / ellipsoids out there such as Clark 1866.
More precisely than an *ellipsoid*, though, we know that earth is a *geoid* - it is not perfectly smooth - and modelling the the undulations due to changes in gravitational pull for different locations is crucial to accuracy in a GIS. This is where a *datum* comes in - a datum is built on top of the selected spheroid and can incorporate local variations in elevation. We have many datums to choose from based on location of interest - in the US we would typically choose *NAD83*
### Why you need to know about CRS working with spatial data in R:
```{r crs_fail, message=FALSE, warning=FALSE, error=FALSE}
data(pnw)
gages = read_csv(system.file("extdata/Gages_flowdata.csv", package = "Rspatialworkshop"),show_col_types = FALSE)
gages_sf <- gages |>
st_as_sf(coords = c("LON_SITE", "LAT_SITE"), crs = 4269, remove = FALSE) |>
dplyr::select(STATION_NM,LON_SITE, LAT_SITE)
# Awesome, let's plot our gage data and state boundaries!
plot(pnw$geometry, axes=TRUE)
plot(gages_sf$geometry, col='red', add=TRUE)
# um, what?
```
There is no 'on-the-fly' projection in R - you need to make sure you specify the CRS of your objects, and CRS needs to match for any spatial operations or you'll get an error
- [spatialreference.org](https://spatialreference.org/) is your friend in R - chances are you will use it frequently working with spatial data in R.
- [projection Wizard](https://projectionwizard.org/) is also really useful, as is [epsg.io](https://epsg.io/)
### Defining your coordinate reference system
A coordinate reference system for spatial data can be defined in different ways - for instance for the standard WGS84 coordinate system you could use:
- Proj4
+ ```+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs```
- OGC WKT
+ ```GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]],AUTHORITY["EPSG","6326"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.01745329251994328,AUTHORITY["EPSG","9122"]],AUTHORITY["EPSG","4326"]]```
- ESRI WKT
+ ```GEOGCS["GCS_WGS_1984",DATUM["D_WGS_1984",SPHEROID["WGS_1984",6378137,298.257223563]],PRIMEM["Greenwich",0],UNIT["Degree",0.017453292519943295]]```
- 'authority:code' identifier
+ `EPSG:4326`
Proj4 used to be the standard crs identifier in R but the preferable way is to use the AUTHORITY:CODE method which `sf` as well as software like [QGIS](https://docs.qgis.org/3.16/en/docs/user_manual/working_with_projections/working_with_projections.html) will recognize. The AUTHORITY:CODE method is durable and easily discoverable online.
The `WKT` representation of EPSG:4326 in `sf` is:
```{r}
sf::st_crs('EPSG:4326')
```
### Projected coordinate systems
Typically we want to work with data that is projected. Projected coordinate systems (which are based on Cartesian coordinates) have: an origin, an x axis, a y axis, and a linear unit of measure. Going from geographic coordinates to a projected coordinate reference systems requires mathematical transformations.
Four spatial properties of projected coordinate systems that are subject to distortion are: *shape*, *area*, *distance* and *direction*. A map that preserves shape is called *conformal*; one that preserves area is called *equal-area*; one that preserves distance is called *equidistant*; and one that preserves direction is called *azimuthal* (from [https://mgimond.github.io/Spatial/chp09_0.html](https://mgimond.github.io/Spatial/chp09_0.html).
The takeaway from all this is you need to be aware of the `crs` for your objects in R, make sure they are projected if appropriate and in a projection that optimizes properties you are interested in, and objects you are analyzing or mapping together need to be in same `crs`.