-
Notifications
You must be signed in to change notification settings - Fork 2
/
Methods_Results_Summary.Rmd
458 lines (388 loc) · 20.2 KB
/
Methods_Results_Summary.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
---
title: "Methods and Results Summary"
author: "ROSSyndicate"
date: "`r Sys.Date()`"
output: html_document
bibliography: references.bib
---
```{r, include = F}
knitr::opts_chunk$set(message = F, echo = F, error = F, warning = F)
packages <- c("tidyverse", "gt", "sf", "tmap")
package_loader <- function(x) {
if (x %in% installed.packages()) {
library(x, character.only = TRUE)
} else {
install.packages(x)
library(x, character.only = TRUE)
}
}
invisible(lapply(packages, package_loader))
model_version = "2024-01-08"
```
# Introduction
Remote sensing image classification is common in terrestrial applications (in
particular, land use and land cover), but has not been applied in aquatic
environments beyond general presence and absence of water and wetlands. The
primary exception to the use of image classification in aquatic environments is
assessing the presence of submerged aquatic vegetation ("SAV") (e.g.,
[@visser2018, @e.l.hestir2012]); however, these classification methods require
high resolution imagery with high spectral resolution often from land-based
high-resolution photography or unpersoned aerial vehicles ("UAVs").
In the Great Lakes (GL) region, much of the use of image classification is
completed using moderate resolution (e.g., Landsat, Sentinel, MODIS) satellite
images, focusing on mapping the distribution and types of wetlands throughout
the region ([@mohseni2023, @v.l.valenti2020]), as well as SAV distribution
throughout the system [@wolter2005]. Most of these analyses focus on a
relatively short temporal period (months to years), while a some span the entire
Landsat archive from the mid '80s through the recent past (e.g., [@amani2022]).
In the recent past, much attention has been paid to the apparent proliferation
of algal blooms in some of the clearest lakes, including Lake Superior (cite).
While detecting algal blooms from moderate-resolution satellite imagery is
difficult due to low temporal frequency, time of day of acquisition, pixel size,
and spectral band metrics (cite), as well as the lack of observed,
spatially-explicit bloom observations to validate presence and absence,
detecting sediment plumes (which often precede algal blooms) is relatively easy
with just the red, green, and blue bands common on nearly all
moderate-resolution satellites.
In this analysis, we use the Landsat Collection 2 Surface Reflectance product
archive (missions 5 through 9, [@vermote2016, @masek2006]) and the Sentinel 2
Surface Reflectance product archive [@drusch2012], a novel crowd-sourced label
data set (eePlumB), and Google Earth Engine to create image classification
models to create a time series of rasters that enumerate sediment plumes across
the western arm of Lake Superior.
```{r fig-aoi, fig.dpi = 300, fig.align="center", fig.width=2, fig.height=1}
aoi <- read_sf("data/aoi/Superior_AOI_modeling.shp") %>%
st_make_valid()
tmap_mode("view")
tm_basemap("CartoDB.Positron") +
tm_shape(aoi) + tm_polygons(col = "#b97eea", alpha = 0.5) +
tm_scale_bar()
```
:"Area of interest for this analysis in purple, consisting of a portion the
western extent of Lake Superior, the Apostle Islands, and Chequamegon Bay."
# Methods
## eePlumB
Using the overarching architecture presented in the Global Rivers Obstruction
Database (GROD) [@yang2022] to engage volunteer observers, we crowdsourced class
labels for Landsat and Sentinel-2 images for the following classes: 'cloud',
'open water', 'light near shore sediment', 'dark near shore sediment', 'offshore
sediment', 'shoreline contamination', 'other', and 'algae bloom' using our Earth
Engine Plume and Bloom labeling interface ("eePlumB"). Dates for labeling were
limited to the months of April through November to avoid ice-on.
In order to eliminate outlier band information and reduce noise in the input for
our models, the second and ninety-eighth percentiles were calculated for each
mission-band combination and label data associated with values outside of those
cutoffs were dropped from the analysis. [[Could add the
`02_label_class_summaries.Rmd` as supplemental.]]
## Model development
We used the built-in gradient tree boost ("GTB") ee.Classifier() method within
Google Earth Engine to create classification models from the crowd-sourced label
data. Label data were randomly split into training (70%) and test (30%) data
sets, with no special handling procedures for classes or satellite missions.
Data were examined to assure that all classes and missions were present in both
the training and testing data sets.
GTB models for each mission were trained independently on the rescaled band data
from red, green, blue, near infrared, and both shortwave infrared bands for
Landsat missions to classify 5 categories: cloud, open water, light near shore
sediment, dark near shore sediment, and offshore sediment. For Sentinel-2, the
bands used to develop the classifier were red, green, blue, red edge 1-3, near
infrared, and both short-wave infrared bands. We did not tune the
hyperparameters for the GTB model, as performance was already acceptable for
discerning open water from sediment plume using 10 trees.
## Image classification
### Image Pre-processing
Mosaic-ed images were made for each mission-date as mean band values where any
two path row or tiles overlapped. All Landsat mission-date images were
pre-processed to remove saturated pixels, and only images with an image quality
greater than or equal to 7 were included. Sentinel-2 bands that had a pixel
resoluiton greater than 10m x 10m were reprojected (downsampled) to 10m x 10m
pixel sizes so that the GTB model could be applied to the composite images more
efficiently. No further pre-processing was completed on the Sentinel-2 data.
Three areas of interest (AOIs) were used in this analysis: the complete AOI, the
AOI without shoreline contamination, and the AOI with shoreline contamination.
The area of shoreline contamination was defined as any area within 60 meters of
a volunteer-identified pixel with shoreline contamination. We assumed that
shoreline contamination was consistent throughout the analysis and was not
specific to any particular satellite or time period.
### Model application and summaries
Each GTB model was applied to the corresponding satellite image stack and two
data types were output: a tabular data summary of the area classified and the
total area of each class for all three AOIs, as well as a .tif raster at the
resolution the GTB was applied (10m for Sentinel-2 and 30m for Landsat) for each
classified mission-date image. The .tif rasters were labeled by pixel with the
following values: 0 = out of area/masked for saturated pixels; 1 = cloud; 2 =
open water; 3 = light, near shore sediment; 4 = offshore sediment; 5 = dark,
near shore sediment.
## Model evaluation metrics
Models were evaluated through error matrices, kappa statistics, and F1
statistics for each class.
- error matrix - testing: given the test data, does the model assign the
correct class? These are tibble-style summaries where the model-assigned
class and label class are compared.
- kappa statistic: an indicator of how much better or worse a model performs
than by random chance. score is -1 to 1, where 0 is the same as random
chance, positive values are better than random chance and negative are
poorer than random chance
- F1 score: the harmonic mean of precision and recall per class (beta = 1,
hence F1 where precision and recall are evenly weighted). Scores of 0 means
the model cannot predict the correct class, a score of 1 means the model
perfectly predicts the correct class.
Models were evaluated as 5-class categories and 3-class categories where all
sediment categories were compiled into a single class.
# Results
## Label dataset
```{r, include = F, echo = F}
ml_labels = c('cloud', 'openWater', 'darkNearShoreSediment', 'lightNearShoreSediment', 'offShoreSediment')
# load labels
labels <- read_csv('data/labels/collated_label_data_v2023-07-20.csv')
only_ml_labels <- labels %>%
filter(class %in% ml_labels)
filtered_labels <- read_csv('data/labels/collated_label_data_filtered_v2023-07-20.csv')
label_table <- labels %>%
group_by(mission, class) %>%
tally() %>%
pivot_wider(names_from = mission, values_from = n) %>%
rename('Landsat 5 (all)' = 'LS5',
'Landsat 7 (all)' = 'LS7',
'Landsat 8 (all)' = 'LS8',
'Landsat 9 (all)' = 'LS9',
'Sentinel-2 (all)' = 'SEN2')
filtered_label_table <- filtered_labels %>%
group_by(mission, class) %>%
tally() %>%
pivot_wider(names_from = mission, values_from = n) %>%
rename('Landsat 5 (filtered)' = 'LS5',
'Landsat 7 (filtered)' = 'LS7',
'Landsat 8 (filtered)' = 'LS8',
'Landsat 9 (filtered)' = 'LS9',
'Sentinel-2 (filtered)' = 'SEN2')
label_table_join <- full_join(label_table, filtered_label_table)
```
The collated crowdsourced label dataset consisted of `r nrow(labels)` labels
across all classes. There were `r nrow(ml_labels)` labels that were part of the
classes of interest (cloud, open water, sediment). After filtering for outliers
from each subset of mission-specific labels, there were
`r nrow(filtered_labels)` labels with complete band information. Table 1
presents a break down of the labels.
```{r, echo = F}
gt(label_table_join) %>%
cols_align(., 'center', columns = c('Landsat 5 (all)':'Sentinel-2 (filtered)')) %>%
tab_spanner('all labels',
c('Landsat 5 (all)':'Sentinel-2 (all)')) %>%
tab_spanner('filtered for model development and testing',
c('Landsat 5 (filtered)':'Sentinel-2 (filtered)')) %>%
tab_style(cell_text(size = 'small'), cells_body()) %>%
tab_style(cell_text(size = 'small'), cells_column_labels()) %>%
tab_footnote('Table 2. Summary of eePlumB classes per mission (version 2023-07-20). All labels (left group) are the raw number of collated labels, the right group are labels that have been filtered as described.')
```
```{r, include = F, echo = F}
mission_date_summary <- labels %>%
group_by(mission, date) %>%
tally()
mission_date_summary_filtered <- filtered_labels %>%
group_by(mission, date) %>%
tally()
md_summ_table <- mission_date_summary %>%
group_by(mission) %>%
tally() %>%
rename('unique mission-dates (all)' = 'n')
md_summ_table_filt <- mission_date_summary_filtered %>%
group_by(mission) %>%
tally() %>%
rename('unique mission-dates (filtered)' = 'n')
summary_table_join <- full_join(md_summ_table, md_summ_table_filt)
```
Labels were present from `r nrow(mission_date_summary)` individual mission-date
combinations spanning the dates of `r min(mission_date_summary$date)` to
`r max(mission_date_summary$date)`. Labels in the filtered dataset were present
from `r nrow(mission_date_summary_filtered)` mission-date combinations spanning
the dates `r min(mission_date_summary_filtered$date)` to
`r max(mission_date_summary_filtered$date)`. See Table 2 for a complete
breakdown of labels by mission-date combination.
```{r, echo = F}
gt(summary_table_join) %>%
cols_align(., 'center', columns = c('unique mission-dates (all)', 'unique mission-dates (filtered)')) %>%
tab_style(cell_text(size = 'small'), cells_body()) %>%
tab_style(cell_text(size = 'small'), cells_column_labels()) %>%
tab_footnote('Table 3. Summary of eePlumB mission-dates present in the label dataset. When filtered for ouliers, only two mission-dates, both Landsat 5 images, were lost from the dataset.')
```
## Model evaluation
Models performance was acceptable across open water, cloud, and discrete
sediment categories. All statistics presented in this section represent summary
statistics for classes from the testing set. Kappa statistic across all missions
was always greater than 0.8, indicating much better performance than random
assignment (Table 4). The F1 score, balanced equally between precision and
recall, was reasonable across all categories and missions with the minimum F1
score being 0.62 for "dark near-shore sediment" for Landsat 7 (Table 4). Cloud
and open water classification F1 scores were always greater than 0.86 (Table 4).
```{r, echo = F}
# get a list of the performance metrics list
perf_metrics_list <- list.files("data/output/",
pattern = "_performance_stats.csv",
full.names = T)
# filter for only those for the desired model version
perf_metrics_list <- perf_metrics_list[grepl(model_version, perf_metrics_list)]
perf_metrics <- map_dfr(perf_metrics_list, read_csv) %>%
#drop accuracy, since it's not meaningful for uneven groups
select(-GTB_accuracy) %>%
mutate(across(c(cloud:GTB_kappa),
~ round(., 2))) %>%
rename("cloud" = "cloud",
"open water" = "openWater",
"light near-shore sediment" = "lightNearShoreSediment",
"off-shore sediment" = "offShoreSediment",
"dark near-shore sediment" = "darkNearShoreSediment",
"model kappa statistic" = "GTB_kappa")
gt(perf_metrics) %>%
cols_align(., 'center', columns = c("cloud":"model kappa statistic")) %>%
tab_spanner("classification F1 score",
c("cloud":"dark near-shore sediment")) %>%
tab_style(cell_text(size = 'small'), cells_body()) %>%
tab_style(cell_text(size = 'small'), cells_column_labels()) %>%
tab_footnote('Table 4. Summary of F1 scores per class and across-class kappa statisic for the GTB model.')
```
When the sediment classes were aggregated to a single sediment class, F1 scores
and the kappa statistic increased dramatically (Table 5).
```{r echo = F}
# load confusion files
confusion_files <- list.files("data/output",
pattern = c(model_version),
full.names = T)
confusion_files <- confusion_files[!grepl("performance", confusion_files)]
confusion_files <- confusion_files[!grepl("training", confusion_files)]
# and read them in
confusion <- map_dfr(confusion_files, read_csv)
# and re-org them with simplified sediment classes
confusion_simple <- confusion %>%
mutate(new_class = if_else(grepl("sed", class, ignore.case = T),
"sediment",
class)) %>%
group_by(new_class, mission) %>%
summarize(across(c(cloud:darkNearShoreSediment),
~ sum(.))) %>%
pivot_longer(cols = c(cloud:darkNearShoreSediment),
names_to = 'class_pred',
values_to = 'n_pred') %>%
mutate(new_class_pred = if_else(grepl("sed", class_pred, ignore.case = T),
"sediment",
class_pred)) %>%
group_by(new_class, new_class_pred, mission) %>%
summarize(n = sum(n_pred)) %>%
pivot_wider(names_from = "new_class_pred",
values_from = "n")
# now calculate F1 score and kappa statistic.
# precision = true positive /(true pos + false positives)
precision_simple <- confusion_simple %>%
mutate(precision_cloud = cloud/(cloud + openWater + sediment),
precision_water = openWater/(cloud + openWater + sediment),
precision_sediment = sediment/((cloud + openWater + sediment))) %>%
mutate(precision = case_when(new_class == "cloud" ~ precision_cloud,
new_class == "openWater" ~ precision_water,
new_class == "sediment" ~ precision_sediment)) %>%
select(new_class, mission, precision)
# recall = true positive /(true pos + false neg)
recall_simple <- confusion_simple %>%
pivot_longer(cols = c("cloud":"sediment"),
names_to = "new_class_pred",
values_to = "count") %>%
pivot_wider(names_from = "new_class",
values_from = "count") %>%
mutate(recall_cloud = cloud/(cloud + openWater + sediment),
recall_water = openWater/(cloud + openWater + sediment),
recall_sediment = sediment/((cloud + openWater + sediment))) %>%
mutate(recall = case_when(new_class_pred == "cloud" ~ recall_cloud,
new_class_pred == "openWater" ~ recall_water,
new_class_pred == "sediment" ~ recall_sediment)) %>%
select(new_class = new_class_pred,
mission, recall)
# F1 score = 2 * (precision*recall)/(precision+recall)
summary_simple = full_join(precision_simple, recall_simple) %>%
mutate(f1score = 2*precision*recall/(precision+recall)) %>%
select(mission, new_class, f1score)
# kappa statistic = observed accuracy - expected accuracy / 1 - expected accuracy
# observed accuracy = number of correct labels / total number of labels
observed_accuracy = confusion_simple %>%
group_by(mission) %>%
mutate(total_labels = sum(across(cloud:sediment))) %>%
pivot_longer(cols = c(cloud:sediment),
names_to = "new_class_pred",
values_to = "count") %>%
filter(new_class == new_class_pred) %>%
group_by(mission, total_labels) %>%
summarize(count_correct = sum(count)) %>%
mutate(accuracy = count_correct/total_labels)
# expected accuracy = sum of marginal frequencys across all classes / total labels
# marginal frequency = actual number of labels correct * predicted number of labels correct / total labels
expected_accuracy = confusion_simple %>%
mutate(actual_correct = cloud + openWater + sediment) %>%
pivot_longer(cols = c(cloud:sediment),
names_to = "new_class_pred",
values_to = "count") %>%
mutate(pred_correct = if_else(new_class == new_class_pred,
count,
NA_real_)) %>%
filter(!is.na(pred_correct)) %>%
left_join(observed_accuracy) %>%
mutate(marginal_frequency = actual_correct*pred_correct/total_labels) %>%
ungroup() %>%
select(mission, marginal_frequency) %>%
group_by(mission) %>%
slice(1) %>%
left_join(observed_accuracy) %>%
mutate(expect_acc = sum(marginal_frequency)/total_labels)
kappa = expected_accuracy %>%
mutate(kappa = (accuracy - expect_acc)/(1-expect_acc)) %>%
select(mission, kappa)
summary_simple <- summary_simple %>%
pivot_wider(names_from = "new_class",
values_from = "f1score") %>%
full_join(kappa) %>%
mutate(across(cloud:kappa,
~ round(., 2))) %>%
rename("model kappa statistic" = "kappa",
"open water" = "openWater")
```
```{r echo = F}
gt(summary_simple) %>%
cols_align(., 'center', columns = c("cloud":"model kappa statistic")) %>%
tab_spanner("classification F1 score",
c("cloud":"sediment")) %>%
tab_style(cell_text(size = 'small'), cells_body()) %>%
tab_style(cell_text(size = 'small'), cells_column_labels()) %>%
tab_footnote('Table 4. Summary of F1 scores per class and across-class kappa statisic for the GTB model with simplified sediment category.')
```
# Discussion
The GTB model was applied to all images in the Landsat and Sentinel 2 stacks,
irregardless of time of year and presence/absence of ice. Classified images
should only be used during ice-free periods, as no attempt was made to mask ice
or to classify ice. It is important to note that evaluation of the GTB model was
only done on the available by-pixel labels and that accuracy at classification
edges may not be precise.
In many cases, cirrus clouds are incorrectly classified as off-shore sediment.
Caution should be used when clouds characterize a large proportion of the AOI.
# Image Viewer Links
The following links are Google Earth Engine scripts that allow for manual
examination of the true color image, the eePlumB classification (version
2024-01-08), and a measure of atmospheric opacity (Landsat 5/7) or cirrus cloud
confidence level (Landsat 8 & 9). For Sentinel 2, the cirrus cloud indication is
a 0 (no cirrus detected) 1 (cirrus detected) value.
[Landsat
5](https://code.earthengine.google.com/3dc621a541fefa6db53e874646f93b13)
[Landsat
7](https://code.earthengine.google.com/83427128adac071d119edbd3a86f1127)
[Landsat
8](https://code.earthengine.google.com/f4aad47222c53d6cf7510ef3e3344119)
[Landsat
9](https://code.earthengine.google.com/f2983f2a2196a2c033afacd22471e398)
[Sentinel
2A](https://code.earthengine.google.com/c8ae30202ad9e2549f009babe736497c)
[Sentinel
2B](https://code.earthengine.google.com/31d9b913a8421091447f213ef6d1db6d)
Note, the Sentinel viewers may hang shortly before displaying the date list.
Landsat 5 and 7 opacity measure does not seem robust for detecting cirrus
clouds. More investigation is needed to determine cirrus cloud contamination in
those instances.
# References
```{r, echo = F}
knitr::wrap_rmd('Methods_Results_Summary.Rmd', width = 80, backup = NULL)
```