-
Notifications
You must be signed in to change notification settings - Fork 5
/
Copy path03_remotesensing.Rmd
279 lines (220 loc) · 11.1 KB
/
03_remotesensing.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
---
title: "Tree canopy processing"
date: "`r format(Sys.time(), '%d %B %Y')`"
output:
github_document:
toc: true
always_allow_html: true
urlcolor: blue
---
For the Twin Cities, Growing Shade nests block groups (the core level of analyses) into larger neighborhood and city-level geographies. This step is not easily applied to other regions, so will likely need to be specifically tailored if applying the methods elsewhere.
**NOTE:** this script **DOES** rely on some parameters found inside the "global" `01_tutorial.Rmd` script, so please be sure to run that before running this script! It is okay if the tutorial script encounters an error and can't run all the way through, you'll still be saving information about which state/counties to use here!
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = FALSE, message = FALSE, warning = FALSE,
cache = FALSE)
library(dplyr); library(tidyr); library(readr); library(stringr); library(tibble); library(ggplot2)
library(tigris)
library(sf)
library(tidycensus)
library(ggbeeswarm)
library(RSocrata)
library(here)
st_erase = function(x, y) st_difference(x, st_union(st_combine(y)))
`%not_in%` <- Negate(`%in%`)
```
# Remote sensing data
These items come out of Google Earth Engine analyses. Some files need to be created/exported (and then imported into GEE), and all GEE exports need to be pulled in to this code.
## Tree canopy data
There are many ways of measuring the tree canopy, and each method has pros/cons.
### Calibrate tree canopy coverage
Growing Shade prioritizes temporal accuracy for tree canopy data. While some trade-offs come with prioritizing temporal accuracy over spatial accuracy, it is essential for this project to capture on-the-ground, real-time dynamics of how Emerald Ash Borer, development patterns, and recent tree planting programs among others are changing the tree canopy.
Sentinel-2 is currently the most spatially accurate and publicly accessible remote sensing platform. However, the 10 meter squared spatial resolution of Sentinel is larger than a lot of individual tree canopies. In exploring the data, it appears as if the canopy coverage from Sentinel is little higher than what it should be (based on aerial imagery). I've chosen to calibrate the Sentinel data with the (outdated) UMN 1 meter squared land use file.
I created a grid (n = 1015) across the region, and created a model to compare the amount of trees detected with the Sentinel 2 and with the UMN 1 meter data set (for the later, the tree area is the summation of all areas which identified as coniferous trees, deciduous trees, and forested wetland).
The final calibration coefficient (at least for 2021 tree canopy) is 0.885246, meaning that Sentinel sees about 11% more trees in areas. Another way to think about this is that Sentinel detects area with at least 89% tree canopy coverage (i.e., if Sentinel sees 1,000 acres of trees, UMN sees more like 885 acres).
In other temperate, upper Midwest areas, the 0.88 coefficient is probably sufficient. Otherwise, figuring out and adjusting a calibration coefficient for another area likely requires some pretty bespoke analyses and has specific data set needs which may not be widely available.
```{r, calibrate-tree-cover}
# ######
# # Create a gridded area across the region to calibrate sentinel tree cover with 1m2 land cover tree data
# # In most instances, there is no need to run this gridding step more than once
# #####
# wholearea <- metc_region %>%
# summarise(st_union(.))
#
# # make a equal area grid; there are 704 tracts, so I want to make at least 1000 grids I think?
# g = st_make_grid(wholearea,
# n = c(36, 36)) %>%
# st_intersection(wholearea)
#
# geometry = st_sfc(lapply(1:length(g), function(x) st_geometrycollection()))
# df <- st_sf(id = 1:length(g), geometry = g)
#
# # ggplot() +
# # geom_sf(data = wholearea) +
# # geom_sf(data = df,
# # fill = "transparent")
#
# sf::st_write(df, "~/Documents/GitHub/planting.shade/storymap-info/shapefiles/metc_grid.shp", append = FALSE)
calibrate_trees <- read_csv(paste0(here::here(), "/data-raw/UMNTreeAcres_metcgrid_scale1_year2021.csv")) %>%
rename(umn = `1`) %>%
full_join(read_csv(paste0(here::here(),"/data-raw/TreeAcres_metcgrid_year2021.csv")) %>%
rename(sentinel = `1`),
by = 'id')
calibrate_lm <- (lm(umn ~ sentinel, data = calibrate_trees))
calibrate_lm2 <- (lm(umn ~ 0 + sentinel, data = calibrate_trees))
calibrate_lm3 <- (lm(umn ~ I(sentinel ^ 2), data = calibrate_trees))
calibrate_lm4 <- (lm(log(umn) ~ sentinel, data = calibrate_trees))
anova(calibrate_lm, calibrate_lm2, calibrate_lm3, calibrate_lm4) # the middle model is best!
# AIC(calibrate_lm); AIC(calibrate_lm2); AIC(calibrate_lm3); AIC(calibrate_lm4)
summary(calibrate_lm2)
summary(calibrate_lm2)$r.squared # r2
summary(calibrate_lm2)$coefficients[,4] # p-value
calib_coeff <- summary(calibrate_lm2)$coefficients[,1] # coefficient
# save(file = paste0(here::here(), "/data-raw/calib_coeff.rda"), calib_coeff)
calibrate_trees %>%
ggplot(aes(x = (umn), y = (sentinel * calib_coeff))) +
geom_point(alpha = .5) +
geom_abline(slope=1, intercept=0, col = 'blue', lwd = 1) +
theme_minimal() +
labs( x = "UMN tree acres", y = "Calibrated Sentinel tree acres")
```
### Process tree canopy cover at various geographies
- process GEE code to link canopy with geography
- GEE data is in repo "users/ehe/MetCoucil/GrowingShade_CanopyCoverage"
- <https://code.earthengine.google.com/a0da66053ecb26b668df4297c4ebed59>
```{r gee-canopy-fxn}
# function for processing gee data
process_gee <- function(x, .group = NULL){
x %>%
filter(ALAND != 0) %>% # new 2020 block groups sometimes are only water
mutate(sq_miles = stringr::str_remove_all(sq_miles, c("\\{groups=\\[|\\}\\]\\}"))) %>%
separate(sq_miles, sep = "\\},", into = c("treeless", "trees")) %>%
rowwise() %>%
mutate(treeless = stringr::str_remove_all(treeless, "\\{classification=0, sum=|\\]\\}"),
treeless = ifelse(ALAND == 0, 0, as.numeric(treeless)),
trees = stringr::str_remove_all(trees, "\\{classification=1, sum=|\\}\\]\\}"),
trees = ifelse(ALAND == 0, 0, as.numeric(trees)),
canopy_percent = (trees / (trees + treeless)) * calib_coeff,
# using the calibration coefficient is important for twin cities
canopy_percent = ifelse(is.na(canopy_percent), 0, canopy_percent)) %>%
group_by(!!enquo(.group)) %>%
mutate(avgcanopy = mean(canopy_percent, na.rm = TRUE)) %>%
# mutate(sum_trees = sum(trees, na.rm = TRUE),
# sum_treeless = sum(treeless, na.rm = TRUE),
# avgcanopy = sum_trees / (sum_trees + sum_treeless) * calib_coeff) %>%
select(-ALAND, -trees, -treeless)
}
```
```{r canopy-processing}
###
# block groups
###
bg_canopy <- read_csv(paste0(here::here(), "/data-raw/TreeMilesIncAg_blockgroups2020_year2021.csv"),
col_select = c(GEOID, sq_miles), col_types = c("GEOID" = "c")) %>%
left_join(bg_geo %>% st_drop_geometry() %>% select(GEOID, ALAND)) %>%
rename(GEO_NAME = GEOID) %>%
process_gee() %>%
ungroup() %>%
rename(bg_id = GEO_NAME)
bg_canopy %>%
st_drop_geometry() %>%
ungroup() %>%
mutate(regional_avg = mean(canopy_percent)) %>%
magrittr::extract2("regional_avg") %>% unique()
```
```{r}
###
# cities
###
ctu_list_raw <- read_csv(paste0(here::here(), "/data-raw/TreeMilesIncAg_ctus_year2021.csv"),
col_select = c(CTU_NAME, sq_miles),
col_types = cols(sq_miles = "c", CTU_NAME = "c")) %>%
mutate(CTU_NAME = ifelse(CTU_NAME == "Credit River Twp.", "Credit River", CTU_NAME),
CTU_NAME = ifelse(CTU_NAME == "Empire Twp.", "Empire", CTU_NAME),
ALAND = 99) %>% # ALAND doesn't matter here, so just set it to a random number
rename(GEO_NAME = CTU_NAME) %>%
process_gee() %>%
ungroup() %>%
full_join(
left_join(ctu_crosswalk, bg_canopy) %>%
group_by(GEO_NAME) %>%
summarise(
min = round(min(canopy_percent) * 100, 1),
max = round(max(canopy_percent) * 100, 1),
avg = mean(canopy_percent),
n_blockgroups = n(),
) %>%
ungroup()) %>%
arrange(GEO_NAME) %>%
full_join(ctu_geo) %>%
st_as_sf()
ctu_list_raw %>%
st_drop_geometry() %>%
ungroup() %>%
mutate(regional_avg = mean(canopy_percent)) %>%
magrittr::extract2("regional_avg") %>% unique()
```
```{r}
###
# neighborhoods
###
nhood_list_raw <- read_csv(paste0(here::here(), "/data-raw/TreeMilesIncAg_neighborhoods_year2021.csv"),
col_select = c(GEO_NAME, city, sq_miles),
col_types = c("GEO_NAME" = "c")) %>%
mutate(
GEO_NAME = case_when(GEO_NAME == "CapitolRiver Council" ~ "Downtown",
GEO_NAME == "Thomas-Dale/Frogtown" ~ "Frogtown",
GEO_NAME == "West Side Community Organization" ~ "West Side",
GEO_NAME == "West 7th Federation/Fort Road" ~ "West 7th-Fort Road",
GEO_NAME == "Highland" ~ "Highland Park",
GEO_NAME == "Summit Hill Association" ~ "Summit Hill",
GEO_NAME == "Eastview-Conway-Battle Creek-Highwood Hills" ~ "Battle Creek-Conway-Eastview-Highwood Hills",
GEO_NAME == "The Greater East Side" ~ "Greater East Side",
GEO_NAME == "Como" ~ "Como Park",
TRUE ~ GEO_NAME),
ALAND = 99) %>% #again, just a random number here
process_gee(.group = city) %>%
full_join(
left_join(nhood_crosswalk, bg_canopy) %>%
group_by(GEO_NAME, city) %>%
summarise(
min = round(min(canopy_percent) * 100, 1),
max = round(max(canopy_percent) * 100, 1),
avg = mean(canopy_percent),
n_blockgroups = n()
) %>%
ungroup(),
by = c("GEO_NAME", "city")) %>%
full_join(nhood_geo) %>%
st_as_sf()
nhood_list_raw %>%
st_drop_geometry() %>%
ungroup() %>%
mutate(regional_avg = mean(canopy_percent)) %>%
magrittr::extract2("regional_avg") %>% unique()
```
## Greenness (NDVI) Data
We do this for all land (no water!) and non-cultivated land (excluding crops/ag land).
```{r ndvi_bgs}
ndvi_uncultivated <-
read_csv(paste0(here::here(), "/data-raw/uncultivatedNDVI_blockgroups2020_year2021.csv"),
na = "No data",
col_types = cols(GEOID = "c", `system:index` = "c",
Year = 'd', `.geo` = 'c')) %>%
rename(GEOID = GEOID)
ndvi_allland <-
read_csv(paste0(here::here(), "/data-raw/landNDVI_blockgroups2020_year2021.csv"),
na = "No data",
col_types = cols(GEOID = "c",
`system:index` = "c",
Year = 'd', `.geo` = 'c')) %>%
rename(GEOID = GEOID)
bg_ndvi <- ndvi_uncultivated %>%
dplyr::select(GEOID, ndvi_uncultivated) %>%
full_join(ndvi_allland %>%
dplyr::select(GEOID, ndvi_land)) %>%
rename(bg_id = GEOID)
```
## Save data
```{r}
save(bg_canopy, bg_ndvi, ctu_list_raw, nhood_list_raw, file = paste0(here::here(), "/data-raw/canopy_data.rda"))
```