chapter_2_lm.qmd

# Assessing uncertainties related to satellite remote sensing indices to estimate Gross Primary Production

## Introduction

Vegetation Gross Primary Production (GPP) is the total amount of carbon fixation
by plants through photosynthesis [@badgley_terrestrial_2019]. Quantifying GPP is
essential for understanding land-atmosphere carbon exchange
[@kohler_global_2018], ecosystem function, and ecosystem responses to climate
change [@guan_comparison_2022; @brown_fiducial_2021; @myneni_relationship_1994].
However, terrestrial GPP cannot be directly measured due to the contribution of
respiration to land carbon fluxes [@anav_spatiotemporal_2015]. Instead, GPP can
be inferred in a non-destructive manner by the net carbon exchange measurements
at the ecosystem level, or at broader scales using models that incorporate
various assumptions and limitations [@reichstein_separation_2005;
@jung_towards_2009].

GPP estimations can be grouped into two broad categories: Eddy Covariance (EC)
techniques, and satellite data-driven methods [@guan_comparison_2022;
@xie_assessments_2020]. EC is the primary in-situ non-destructive method for
measuring terrestrial fluxes, and specifically for quantifying the exchange of
CO2 between land and the atmosphere, using advanced field instrumentation
[@baldocchi_how_2020; @badgley_terrestrial_2019; @ryu_what_2019;
@tramontana_predicting_2016]. However, EC measurements come with certain
limitations, such as their relatively low spatial resolution, typically less
than \< 1km^2^ , which constrains the accuracy of estimating ecosystem carbon
and water fluxes at regional and global scales. [@badgley_canopy_2017].

Additionally, it's important to note that EC techniques directly measure Net
Ecosystem Exchange (NEE), not GPP. Subsequently, GPP must be estimated by both
subtracting respiration using models and ancillary measurements while accounting
for the removal or deposition of carbon stocks due to natural or anthropogenic
transport processes such as water flow, fires, or harvest
[@beer_terrestrial_2010; @reichstein_separation_2005].

The satellite data-driven models constitute the second category of methods to
estimate GPP. Because of their dependency on earth observation platforms, they
are not spatially constrained but can have greater uncertainties than the EC
techniques [@ryu_what_2019; @wang_diagnosing_2011]. Satellite data-driven models
can be classified into process-based models, Light Use Efficiency models (LUE),
and Vegetation Index models [@xie_assessments_2020].

Process-based models integrate climate, canopy, and soil information derived
from multiple sources, including satellite EO, into biophysical models of
carbon, water, energy, and nutrient cycles with varying levels of detail
[@running_general_1988; @harris_global_2021]. While these models can be scaled
globally [@beer_terrestrial_2010; @jung_towards_2009] they require many
parameters that may not be readily available in changing landscapes or for 
fine-scale studies.

LUE models are based on the concept of radiation conversion efficiency and take
into consideration ecological processes [@liu1997process;
@heinsch_evaluation_2006]. This efficiency signifies the amount of carbon a
specific vegetation type can fix per unit of solar radiation
[@monteith_solar_1972]. Initially used for NPP estimation [@prince_model_1991],
LUE models were subsequently adapted for GPP and respiration calculations
[@goetz_mapping_1999]. These models explicitly account for the impact of
environmental stress on plant physiological responses. The imposition of
environmental stressors may lead to a reduction in the rates of daily carbon
assimilation, thereby diminishing overall efficiency. [@prince_model_1991;
@running_general_1988].

The GPP product from the Moderate Resolution Imaging Spectroradiometer (MODIS)
employs an algorithm based on the radiation conversion efficiency concept. This
algorithm establishes a connection between absorbed photosynthetically active
radiation (APAR) and the LUE term [@heinsch_evaluation_2006] as shown in
@eq-gpp.

$$
\small
GPP = PAR \times fAPAR \times LUE
$$ {#eq-gpp}

Where PAR is the incident photosynthetically active radiation
[@monteith2013principles] and `fAPAR` is the fraction of the PAR that is
effectively absorbed by plants [@gcos200]. The LUE term depends on vegetation
type but also physiological conditions are driven by water availability,
temperature stress, and vapour pressure deficit [@goetz_mapping_1999;
@running_general_1988]. Obtaining these variables for every vegetation type on
Earth can be challenging, introducing assumptions that amplify uncertainties
[@goetz_mapping_1999]. Nevertheless, under unstressed conditions, LUE remains
constant for a given vegetation type, requiring only PAR and fAPAR to assess
primary productivity [@running_continuous_2004].

VIs are the Satellite data-driven models' third approach. VIs are a summary of
non-linear functions of surface bi-directional reflectance spectra
[@myneni_relationship_1994] derived from optical sensors that are combined with
climate variables to calculate GPP [@wu_predicting_2011]. This is usually done
with some form of regression and physical methods that associate interactions
between vegetation and incoming radiation [@fernandez-martinez_monitoring_2019].

VIs have been used to provide inputs to @eq-gpp related to fAPAR from regional
to global extents or to estimate GPP [@sellers_global_1994;
@running_continuous_2004]. Some of the most common VIs to estimate GPP are the
Normalized Difference Vegetation Index (NDVI), the Enhanced Vegetation Index
(EVI), the Near-Infrared Reflectance Index (NIRv), or the Chlorophyll/Carotenoid
Index (CCI) among others. [@balzarolo_influence_2019; @rahman_potential_2005;
@rahman_potential_2005; @xie_assessments_2020; @badgley_terrestrial_2019;
@zhang_potential_2020; @sellers_global_1994]. These VIs are based on a spectral
reflectance ratio between the red and near-infrared regions of the
electromagnetic spectrum [@glenn_relationship_2008] which tracks an integrated
impact of fraction of photosynthetically active radiation (fAPAR) and LUE on
productivity [@myneni_relationship_1994].

An index such as NDVI is good for detecting structural vegetation changes in
seasonal variability, but it becomes saturated with high biomass conditions
[@badgley_canopy_2017]. Other indices such as EVI can overcome the soil and
atmospheric effects by adding the blue band [@huete1988soil] and may be better
suited for predicting GPP in large biomass forests [@badgley_canopy_2017].
Nonetheless, it has been evaluated in a narrow range of ecosystems and needs
inputs of start and end dates of the growing seasons which can increase
uncertainties [@shi_assessing_2017]. For specific types of ecosystems such as
evergreen conifers, CCI can track the seasonality of daily GPP due to its
sensibility to the chlorophyll/carotenoid pigment ratios [@gamon_remotely_2016].

Other indices such as the near-infrared reflectance index (NIRv) have been
formulated to address the mixed-pixel effect (pixel with vegetated and
non-vegetated features) and to determine the vegetation photosyntetic capacity
[@badgley_terrestrial_2019]. NIRv is defined as the fraction of reflected NIR
light that originates from vegetation. NIRv was originally proposed as a
replacement for fPAR in LUE models in that the NIR and PAR reflectance of
vegetation are correlated and the scaling by NDVI corrects for soil
contributions in the signal [@badgley_terrestrial_2019]. NIRv has been shown to
have a stronger correlation to GPP at flux towers and on a regional basis than
fAPAR notwithstanding the fact that GPP is directly related to fAPAR. NIRv has
been extended to include weighting with PAR [@dechant_nirvp_2022] and replacing
the NIR reflectance with NIRv radiance [@wu_radiance-based_2020]. Both of these
approaches have been shown to have even stronger correlations with GPP than NIRv
as could be expected since they either directly or indirectly weight the NIRv
with PAR.

The stronger correlation between the NIRv index to GPP in comparison to the
correlation between VIs related to fAPAR and GPP seemingly contradicts the
hypotheses in the LUE model that GPP should be linearly related to fAPAR. There
are two explanations: i. Many of the reported comparative studies use fAPAR
based on VIs and not APAR. We hypothesize that the NIR indices are simply better
estimators of APAR than these VI approximations due to lower measurement error
for the NIR indices or a stronger physical relationship between them and APAR
versus the historical VIs. ii. The strength of the correlation between APAR and
GPP depends on LUE having a linear relationship to observed APAR. While this may
hold in some circumstances (e.g. early seasonal measurements for vegetation such
as crops where leaf chlorophyll concentration increases during the growth phase)
it is not the case in general and definitely during stress conditions
[@monteith_solar_1972].

Despite these efforts, relying solely on remote satellite VIs presents a
challenge. The assumptions accompanying VI models suggest the importance of
systematically quantifying their predictive capacity for GPP to validate and
improve their accuracy [@anav_spatiotemporal_2015; @brown_fiducial_2021] across
various growing seasons and locations. This is particularly crucial because
photosynthesis regulation can occur with no major changes in canopy structure or
leaf pigments that can undergo without being detected with reflectance data
[@pabon-moreno_potential_2022; @pierrat_diurnal_2022], implying that temporal
aggregation may be critical for VI models. In-situ eddy-covariance flux
measurements coupled with locally calibrated models for respiration
[@baldocchi_how_2020] represent a suitable reference GPP for validation solution
given that they represent site-level observations
[@chu_representativeness_2021].

As such, the main objective of this M.Sc. thesis chapter is to quantify and
compare the uncertainty associated with the VI/LUE models to estimate GPP. To
control for variability in environmental conditions (E), sites are selected that
share the same land cover, climate, and biome characteristics. VIs models are
evaluated by comparison to EC based estimates of GPP for multiple growing
seasons from the Bartlett Experimental Forest (USA), the Borden Forest Research
Station flux-site (Canada), and the University of Michigan Biological Station
(USA). Considering the reviewed studies, we hypothesize that the uncertainty of
this approach will depend on the nature of the VI and the spatial and temporal
aggregation of application. Specifically, we expect that i. the NIRv and CCI
indices will consistently demonstrate a stronger correlation and lower
prediction uncertainty for GPP compared to NDVI and EVI across the tested sites.
ii. larger temporal aggregations will result in improved predictions due to the
reduction in observation variability.

\newpage

## Methods

```{r libraries and sources}
#| echo: false
#| message: false
#| warning: false

# Libraries
library(ggplot2)
library(cowplot)
library(lubridate)
library(purrr)
library(broom)
library(gt)
library(tidymodels)
library(broom)
library(usemodels)
library(vip)
library(h2o)
library(mgcv) 

# Source files
# Source the objects created for the complete GPP trends.
# This file will source the code and load objects to memory.
source("scripts/trend_plots.R")

# Source the objects created for the complete GPP trends
source("scripts/models_data_preparation.R")

# Source file with functions to plot rf predictions
source("R/plot_exploratory.R")
```

### Eddy Covariance sites

We used three deciduous broadleaf forest sites located in the northern
hemisphere (see @fig-sites_locations) with eddy covariance (EC) data collected
by Ameriflux. For each site, we used the daily, weekly, and monthly GPP values
(GPP_DT_VUT_REF variable) estimated using the ONEFlux workflow
[@pastorello2020fluxnet2015]. The ONEFlux processing does the estimation of the
CO~2~ fluxes into GPP and Ecosystem Respiration (RECO) from Net Ecosystem
Exchange (NEE) through two methods known as daytime and nighttime. Here we
selected the daytime method (DT) which uses daytime and nighttime to
parameterize a model with two components: one based on light response curve and
vapour pressure deficit and a second one using a respiration-temperature
relationship to estimate RECO which in turn is used to obtain the difference
with NEE and provide GPP [@pastorello2020fluxnet2015].

![Sites locations](img/ge_sites.jpg){#fig-sites_locations}

@fig-gpp_trends displays the GPP trends for the University of Michigan
Biological Station, Bartlett Experimental Forest, and the Borden Forest Research
Station. Additional details regarding the characteristics of these datasets can
be found in Table @tbl-oneflux_datasets.

| Site     | Data range available | Dataset name                                                | Reference                  |
|--------------------|--------------------|--------------------|--------------------|
| Bartlett | Jan 2015 to Dec 2017 | US-Bar: Barlett Experimental Forest (version: beta-3)       | [@staebler2019ameriflux]   |
| Borden   | Jan 2015 to Jan 2022 | CA-Cbo: Ontario - Mixed Deciduous, Borden Forest Site       | [@richardson2016ameriflux] |
| Michigan | Jan 2015 to Jan 2018 | US-UMB: Univ. of Mich. Biological Station (version: beta-4) | [@gough2016ameriflux]      |

: ONEFlux sites datasets description {#tbl-oneflux_datasets}

The Bartlett experimental forest is located in New Hampshire, USA (44°06′N,
71°3′W). This site is characterized by a forest with an average canopy height
ranging from 20 to 22 meters with a mean annual temperature of 6°C. Despite
events such as a hurricane in 1938 and small scale forest management, the
forest's mean stand age is around 120-125 years [@ouimette_carbon_2018].

The second flux site is Borden Forest Research Station located in Ontario
(44°19′N, 79°56′W), Canada. This is one of the largest patches of forest in
Southern Ontario which has been collecting EC data since 1996
[@rogers_response_2020]. This site has a forest cover of over 60% with a height
of approximately 22 m. It's a deciduous broadleaf natural re-growth forest since 1916
dominated by woody vegetation [@lee_long-term_1999].

The third site is the University of Michigan Biological Station which is located
in northern Michigan, USA (45°350 N 84°430 W). The site has a forest with
different succesional stages, with an average stand age of 90 years
[@gough_wood_2010], and a mean height of 22m. The mean annual temperature is
around 5.5°C [@gough_disturbanceaccelerated_2021].

The three sites exhibit similar characteristics, indicating their representation
of a specific ecosystem type. This uniformity enables meaningful comparisons and
offers valuable insights into the relationship between GPP and VIs. A tabular
summary of site characteristics, guided by insights detailed in Teets
[-@teets_coupling_2022], is presented in @tbl-site_summary.

```{r}
#| label: tbl-site_summary
#| tbl-cap: "ONEFlux Site characteristics overview"
#| tbl-colwidths: [50,50]
#| echo: false
#| message: false
#| warning: false
data <- data.frame(
  Variable = c("Mean annual temperature (°C)", "Mean annual precipitation (mm)", "Elevation (m)", "Dominant genera", "Climate Koeppen"),
  Bartlett = c(6, 1246, 272, "Acer, Fagus, Betula", "Dfb"),
  Michigan = c(5.5, 803, 234, "Populus, Quercus, Pinus", "Dfb"),
  Borden = c(7.4, 784, 209, "Acer, Pinus, Populus", "Dfb")
)

data %>%
  gt() %>%
  tab_spanner(
    label = "Site",
    columns = c("Bartlett", "Michigan", "Borden")
  ) %>%
  fmt_number(
    columns = c("Bartlett", "Michigan", "Borden"),
    decimals = 1
  ) %>% 
  tab_footnote(
    footnote = md("**D** stands for the warm-summer continental or hemiboreal climate. 
**f** indicates that this climate has significant precipitation in all seasons.
**b** indicates that the warmest month has an average temperature between 22°C and 28°C."),
locations = cells_body(columns = Variable, rows = 5)
  )
```

```{r}
#| label: fig-gpp_trends
#| fig-cap: "Reference in-situ GPP time series from the study sites on a daily (a), weekly (b), and monthly (c) basis for the University of Michigan Biological Station, Bartlett experimental forest, and the Borden Forest Research Station"
#| fig-width: 11
#| fig-height: 12
#| echo: false
#| message: false
#| warning: false
gpp_trends
```

<!-- ```{r} -->

<!-- #| label: fig-michigan_gpp_trends -->

<!-- #| fig-cap: "GPP trends for Michigan" -->

<!-- #| fig-width: 7 -->

<!-- #| fig-height: 9 -->

<!-- #| echo: false -->

<!-- #| message: false -->

<!-- #| warning: false -->

<!-- michigan_gpp_trends -->

<!-- ``` -->

<!-- ```{r} -->

<!-- #| label: fig-bartlett_gpp_trends -->

<!-- #| fig-cap: "GPP trends for Bartlett" -->

<!-- #| fig-width: 7 -->

<!-- #| fig-height: 9 -->

<!-- #| echo: false -->

<!-- #| message: false -->

<!-- #| warning: false -->

<!-- bartlett_gpp_trends -->

<!-- ``` -->

<!-- ```{r} -->

<!-- #| label: fig-borden_gpp_trends -->

<!-- #| fig-cap: "GPP trends for Borden" -->

<!-- #| fig-width: 7 -->

<!-- #| fig-height: 9 -->

<!-- #| echo: false -->

<!-- #| message: false -->

<!-- #| warning: false -->

<!-- borden_gpp_trends -->

<!-- ``` -->

<!-- The GPP for each of the sites can -->

<!-- -   include description of the ONEFlux processing (gap filling, data quality, -->

<!--     gpp estimation, gpp uncertainty with DT and NT) -->

### Satellite imagery

We used data from the Terra Moderate Resolution Imaging Spectroradiometer
(MODIS), specifically the collection MOD09GA Version 6.1 product (MODIS/Terra
Surface Reflectance Daily L2G Global 1 km and 500 m SIN Grid) chosen for its
daily sampling and broad temporal coverage. Data retrieval was performed using
Google Earth Engine (GEE). For each site, a square polygon with an area of 3 km
surrounding the EC tower was defined, and the pixel values within this polygon
were extracted for comprehensive analysis.

The MODIS contains the surface spectral reflectance from bands 1 through 7
with a spatial resolution of 500m, with corrections for atmospheric conditions
such as aerosols, gasses, and Rayleigh scattering [@vermote2021modis], and
validation of cloud-free pixels as well. Bands used to derive the vegetation
indices are shown in @tbl-MODIS_500_indices_bands

| **Name**    | **Description** | **Resolution** | **Wavelength** |
|-------------|-----------------|----------------|----------------|
| sur_refl_01 | Red             | 500 meters     | 620-670nm      |
| sur_refl_02 | NIR             | 500 meters     | 841-876nm      |
| sur_refl_03 | Blue            | 500 meters     | 459-479nm      |
| sur_refl_04 | Green           | 500 meters     | 545-565nm      |

: MODIS (MOD09GA.061 product) bands used to calculate the VIs
{#tbl-MODIS_500_indices_bands}

<!-- | Data                  | Spatial resolution (m) | GEE image collection         | -->

<!-- |-----------------------|------------------------|------------------------------| -->

<!-- | MODIS reflectance     | 500                    | MODIS/006/MOD09GA            | -->

<!-- | MODIS reflectance     | 250                    | MODIS/061/MOD09A1            | -->

<!-- | Harmonized Sentinel-2 | 20 - 60 - 10           | COPERNICUS/ S2_SR_HARMONIZED | -->

<!-- : Satellite images datasets {#tbl-satellite_images_datasets} -->

The data processing involved three main steps: selecting high-quality pixels,
scaling band values, and calculating vegetation indices. MODIS product data have
4 bit-encoded variables which provide information about the observation
quality. From those variables, only the 1km Reflectance Data State QA
(`state_1km`) and Surface Reflectance 500m Quality Assurance (`qc_500m`)
variables were used along with each of the band's bits quality indicators, as
250m scan value information (`q_scan`) was not informative and Geolocation flags
(`g_flags`) had the same value for all observations. The bit-encoded variables
were transformed into categorical strings, and only the categories indicating
the best quality were selected to filter the pixels (@fig-quality_pixels). The
specific bit strings selected for `state_1km` are shown in
@tbl-state_1km_bitstrings and for `qc_500m` in @tbl-qc_scan_bit_strings.
Subsequently, the surface reflectance for each filtered pixel was determined by
scaling the digital number recorded by `0.0001`.

```{r}
#| label: fig-quality_pixels
#| fig-cap: "Total number of observations (pixels) from MODIS classified as high quality (used in the analysis) or other quality (filtered out from the analysis) per site."
#| echo: false
#| message: false
#| warning: false
source("scripts/quality_observations.R")

all %>% 
  ggplot(aes(x = site, fill = quality)) +
  geom_bar(position = "stack") +
  scale_fill_viridis_d(begin = 0.34, end = 0.8) +
  labs(x = "Site", 
       y = "Total observations (pixels)",
       fill = "Quality") +
  theme_bw(base_size = 12)
```

Following the MODIS Collection 6.1 (C61) LSR Product User Guide
[@vermote2021modis], any scaled value that fell outside the range of `0` to `1`
was considered a fill value or uncorrected Level 1B data and was subsequently
discarded. These values were deemed unreliable or lacking meaningful information
for the analysis. The number of high-quality surface reflectance observations
was summarized on a monthly basis for each site (@fig-complete_quality_pixels)

```{r}
#| label: fig-complete_quality_pixels
#| fig-cap: "Total number of observations (pixels) from MODIS classified as high quality (used in the analysis) or other quality (filtered out from the analysis)"
#| echo: false
#| message: false
#| warning: false
source("scripts/quality_observations.R")

all %>%
  filter(quality == "high") %>%
  mutate(year_mon = zoo::as.yearmon(date)) %>%
  ggplot(aes(x = year_mon, fill = site)) +
  geom_bar(position = "stack") +
  scale_fill_viridis_d(begin = 0.2, end = 0.8) +
  labs(x = "Date",
       y = "Total observations (pixels)",
       fill = "Site") +
  theme_bw(base_size = 12)
```

The VIs `NDVI` (@eq-ndvi), `NIRv` (@eq-nirv), `EVI` (@eq-evi), and `CCI`
(@eq-cci) were calculated and then matched with the corresponding date in the
flux datasets (Green, Red, and NIR correspond to the bands defined in
@tbl-MODIS_500_indices_bands). 

<!-- However, since the -->

<!-- MODIS product with a spatial resolution of 250m (`MODIS/061/MOD09A1`) does not -->

<!-- include the blue band EVI calculations were not performed for this dataset. -->

```{=tex}
\begingroup
\fontsize{10pt}{10pt}\selectfont
```
$$
\small
NDVI = \frac{NIR - Red}{NIR + Red}
$$ {#eq-ndvi}

$$
NIRv = NIR\times\frac{NIR - Red}{NIR + Red}
$$ {#eq-nirv}

$$
EVI = 2.5\times\frac{\mathrm{NIR} - \mathrm{Red}}{%
          (\mathrm{NIR} + 6\times \mathrm{Red}
          - 7.5\times \mathrm{Blue} + 1)}\\
$$ {#eq-evi}

<!-- $$ -->

<!-- kNDVI = tanh(\mathrm{NDVI}^2) -->

<!-- $$ {#eq-kndvi} -->

$$
CCI = \frac{Green - Red}{Green + Red}
$$ {#eq-cci} \endgroup

<!-- For Harmonized Sentinel-2 data the data processing consisted on using the -->

<!-- `msk_cldprb` variable to filter out pixels according to the probabilities of -->

<!-- containing a cloud or the `msk_snwprb` variable for presence of snow -->

<!-- probability. Once the pixels without any probability of having clouds or snow -->

<!-- were filtered from the rest, we scaled all the reflectance bands observations by -->

<!-- a factor of `0.0001`. With the values scaled we proceed to calculate the -->

<!-- vegetation indices `NDVI` (@eq-ndvis), `NIRv` (@eq-nirvs), and `EVI` (@eq-evis). -->

<!-- The bands used to calculate these indices are in -->

<!-- @tbl-harmonized_s2_indices_bands -->

<!-- $$ -->

<!-- NDVI = \frac{b8 - b4}{b8 + b4} -->

<!-- $$ {#eq-ndvis} -->

<!-- $$ -->

<!-- NIRv = b8\times\frac{b8 - b4}{b8 + b4} -->

<!-- $$ {#eq-nirvs} -->

<!-- $$ -->

<!-- EVI = 2.5\times\frac{\mathrm{b8} - \mathrm{b4}}{% -->

<!--           (\mathrm{b8} + 6\times \mathrm{b4} -->

<!--           - 7.5\times \mathrm{b2} + 1)}\\ -->

<!-- $$ {#eq-evis} -->

### Data Preparation

Three datasets were prepared for each site: a daily, a weekly, and a monthly
dataset. These datasets were generated from the satellite imagery data with the
selected high-quality pixels and the ONEFluxprocess data in order to capture
variations in vegetation indices (VI), band values, and GPP over different time
scales.

The daily dataset included the VI values, band values only from the high-quality
pixels, and GPP measurements derived from the ONEFlux process collected on a
daily basis for the time period of available GPP at each site
@tbl-oneflux_datasets. This dataset provided a high-resolution representation of
the variables, allowing for a detailed analysis of their daily fluctuations.

The weekly and monthly datasets were derived from the corresponding daily
dataset. These datasets contained summarized values of the VIs and band values,
aggregated over the weekly and monthly time frames, respectively. The
aggregation process involved calculating an average for the VIs and band values
within each week or month.

For the GPP values in the weekly and monthly datasets, rather than summarizing
the daily GPP values, the GPP measurements for the weekly and monthly time
frames were obtained directly from the ONEFlux process, which provided a
reliable estimation of GPP for these longer time intervals.

By creating these three datasets (daily, weekly, and monthly), the study allowed
for a comprehensive analysis of the VIs, band values, and GPP at different
temporal resolutions. This approach provided insights into the temporal dynamics
and patterns of the variables, enabling a more thorough understanding of the
processes and relationships under investigation.

For each site and across all time scales (daily, weekly, and monthly), GPP
values below 1 gC m^-2^ d^-1^ were excluded. These values were deemed either
below the detection limit or insufficient to make a significant contribution to
the overall analysis, particularly in representing meaningful vegetation
productivity. This process aimed to refine the dataset and focus on values
within a range considered more pertinent to the growing season. The final number
of observations per site after selecting the high-quality pixels and GPP
observations are shown in @fig-obs_used_analysis.

```{r}
#| label: fig-obs_used_analysis
#| fig-cap: "Monthly high-quality MODIS observations after joining with flux observations containing Gross Primary Productivity (GPP) values higher than 1."
#| echo: false
#| message: false
#| warning: false

# Dataset used to analyze the data
daily_plot_500 %>% 
  # Dataset have all indices per row, so for the same date there are 5 obs
  filter(index == "ndvi_mean") %>% 
  group_by(site, date) %>% 
  tally() %>% 
  group_by(site) %>% 
  mutate(year_mon = zoo::as.yearmon(date)) %>% 
  ggplot(aes(x = year_mon, fill = site)) +
  geom_bar(position = "stack") +
  scale_fill_viridis_d(begin = 0.2, end = 0.8) +
  labs(x = "Date",
       y = "Total observations",
       fill = "Site") +
  theme_bw(base_size = 12) 

# daily_plot_500 %>% 
#   pivot_wider(names_from = index, values_from = value) %>% 
#   select(date, total_obs, site) %>% 
#   mutate(year_mon = zoo::as.yearmon(date)) %>% 
#   group_by(site, year_mon) %>% 
#   tally() %>% 
#   ungroup() %>% 
#   mutate(year_mon = as.factor(year_mon)) %>%
#   ggplot(aes(x = year_mon, y = n, fill = site)) +
#   geom_bar(stat = "identity", position = "stack") +
#   scale_fill_viridis_d(begin = 0.2, end = 0.8) +
#   labs(x = "Date",
#        y = "Total observations",
#        fill = "Site") +
#   theme_bw(base_size = 12) +
#   theme(axis.text.x = element_text(angle = 90, h = 1))
```

### Data Analysis

To address the potential non-linearity of the lowest uncertainty data-driven
model across different temporal aggregation scales or for specific VIs, we
employed two distinct modeling approaches: a Linear Model (LM) and a
Generalized Additive Model (GAM). GAM models allow a better fit for those cases
where the distribution and variability observed in the data is greater, due to
the varying temporal scales. We aimed to evaluate whether this variance could be
better explained by this type of model, which might not be optimal to capture
with a linear relationship, potentially leading to greater residuals. Both
models’ approaches were applied individually to test our hypothesis regarding VI
predictive uncertainty. Additionally, a single model was applied to assess
whether a combination of VIs could improve GPP estimation uncertainty.

This facilitated a comparative analysis of how each individual index
independently explains the variation in GPP per site. Additionally, the
performance of each VI was also evaluated using models calibrated using all
sites to assess their robustness to spatial variability within the selected
temperate broadleaf biome across various temporal scales. Another step of the
analysis was the examination of the relationship between GPP and all indices
functioning as covariates, both on an individual site basis and without site
distinction.

In total, per each site plus the category of all-sites we created 5 linear
models and 5 GAM models for every timescale (daily, weekly, and monthly): one
per each index (NDVI, CCI, EVI, NIRv) and one with all the indices as covariates
(NDVI + CCI + EVI + NIRv), resulting in a total of 120 models. To assess and
compare the models' performance, we calculated the coefficient of determination
(R²) to measure the correlation between the actual observations and predictions.
Additionally, we used the Root Mean Squared Error (RMSE) and the Mean Absolute
Error (MAE) as indicators of the model estimation error.

\newpage

## Results

### Analysis of GPP-Vegetation Index Relationships Using Linear Models

<!-- ### Monthly GPP and VIs relations -->

The @tbl-lm_monthly_results provides a summary of linear models used for GPP
estimation at each site, employing the vegetation indices as predictors. For
each site and predictor, the table includes the relevant model summary
statistics, such as R², MAE, and RMSE as indicators of the statistical
significance of the model fit. All models add p-values \< 0.05 More metrics such
as the p-value, adjusted r-squared, the Akaike Information Criterion (AIC), and
Bayesian Information Criterion (BIC) are displayed in the
@tbl-complete_lm_monthly_results for monthly results,
@tbl-complete_lm_weekly_results for weekly results, and the
@tbl-complete_lm_daily_results for daily outputs. A residuals distribution for
each of the models is in @fig-lm_residuals

Our findings show that when using all the indices as covariates within the
linear model, the model's performance demonstrates better outcomes compared to
using any single index alone across all scenarios. In the assessment of
individual VI performance on a monthly basis, CCI tend to perform better than
EVI, NDVI, and NIRv. Although all models were statistically significant (p \<
0.05), it is important to note that for the Bartlett and Michigan sites, while
CCI shows favourable predictive accuracy, its advantage over the NIRv and EVI is
slightly better and mostly due to the MAE
($0.86 \, \mathrm{gC \, m^{-2} \, d^{-1}}$ for Bartlett and
$1.02 \, \mathrm{gC \, m^{-2} \, d^{-1}}$ for Michigan) and RMSE
($1.05 \, \mathrm{gC \, m^{-2} \, d^{-1}}$ for Bartlett and
$1.29 \, \mathrm{gC \, m^{-2} \, d^{-1}}$ for Michigan) results.

The NDVI displays relatively diminished performance, indicating a 9% and 15%
reduction in the R² compared to CCI in the Bartlett and Michigan sites,
respectively. In contrast, EVI exhibits less favourable predictive results (R² =
0.75, MAE = $1.92 \, \mathrm{gC \, m^{-2} \, d^{-1}}$, RMSE =
$2.54 \, \mathrm{gC \, m^{-2} \, d^{-1}}$) for the Borden site and the
aggregated sites category, although in the specific context of the Borden site,
its performance aligns closely with NDVI and NIRv. Among the individual sites,
Bartlett has the most favourable predictive outcomes in terms of R², MAE, and
RMSE for every individual VI model, while the aggregated sites category yields
the least favourable predictive results in the same scenarios.

In the case of the weekly models, NDVI records the least favourable results in
terms of R², MAE, and RMSE when assessed at the three individual sites, while
EVI demonstrated its weakest predictive capabilities when all sites were treated
as a single entity. However, for this case, EVI shows marginal differences for
individual indices such as NDVI and NIRv (R² = 0.01 variance explained and no more
than $0.02 \, \mathrm{gC \, m^{-2} \, d^{-1}}$ in RMSE). The best performant
individual indices were CCI in terms of R², MAE, and RMSE for Borden and the
aggregated sites, NIRv for Michigan and EVI for Bartlett. Nonetheless, those
superior performances are subtle when compared with the other individual
indices. On a weekly basis, differences in variability explained between the
best and least performing models range from R² = 0.11 to R² = 0.2. Bartlett and
Michigan sites consistently yield the most accurate predictive models.

On a daily basis, CCI outperforms the rest of the individual indices in 3 cases:
for Bartlett by R² = 0.02 in variance explanation compared with EVI, for Borden by
R² = 0.01 compared with EVI, and for the combined sites dataset by R² = 0.1.
Nonetheless is worth mentioning that the variance explained in any of the models
by individual indices in Borden or the combined site is less than R² = 0.5 in
most of the cases and the error scores are the highest. In the case of
Michigan, the best performing individual index was EVI which outperformed the
next best-performing individual index NIRv by R² = 0.03. Generally, the Bartlett
and Michigan sites consistently yielded the most accurate predictive models
across various configurations. Conversely, the Borden site consistently
exhibited the poorest model performance across all scenarios.

Overall, when comparing individual indices, CCI consistently performed better
across different timeframes in terms of variance explainability and error
metrics. Nonetheless, those differences are subtle compared to EVI and NIRv. In
contrast, NDVI consistently performs less mentioning across all evaluated
timeframes. Notably, models based on monthly values consistently exhibit better
performance than those based on weekly or daily values as ilustrated in
@fig-lm_metrics.

```{=tex}
\begingroup
\setlength{\LTleft}{0pt minus 500pt}
\setlength{\LTright}{0pt minus 500pt}
\fontsize{5pt}{7pt}\selectfont
\addtolength{\tabcolsep}{-3pt}
```
```{r lm_monthly}
#| label: tbl-lm_monthly_results
#| tbl-cap: "Summary of Linear models for GPP estimation using the vegetation indices on a monthly (a), weekly (b), and daily (c) basis.  MAE and RMSE metrics units are gC m-⁻² d-⁻¹"
#| tbl-colwidths: [50,50]
#| echo: false
#| message: false
#| warning: false
source("scripts/lm_preparation.R")

create_metrics_table <- function(data) {
  data %>% 
    gt() %>%
    tab_spanner(label = md("NDVI"), columns = ends_with("NDVI")) %>%
    tab_spanner(label = "EVI", columns = ends_with("EVI")) %>%
    tab_spanner(label = "NIRv", columns = ends_with("NIRv")) %>%
    tab_spanner(label = "CCI", columns = ends_with("CCI")) %>% 
    tab_spanner(label = "All", columns = ends_with("All")) %>% 
    cols_label(
      .list = list(
        "site" = "Site",
        "r.squared_EVI" = "R2",
        "mae_EVI" = "MAE",
        "rmse_EVI" = "RMSE",
        "r.squared_NDVI" = "R2",
        "mae_NDVI" = "MAE",
        "rmse_NDVI" = "RMSE",
        "r.squared_NIRv" = "R2",
        "mae_NIRv" = "MAE",
        "rmse_NIRv" = "RMSE",
        "r.squared_CCI" = "R2",
        "mae_CCI" = "MAE",
        "rmse_CCI" = "RMSE",
        "r.squared_All" = "R2",
        "mae_All" = "MAE",
        "rmse_All" = "RMSE"
      )
    ) %>% 
    cols_align(
      align = "center",
      columns = 2:16
    ) %>% 
    fmt_number(
      columns = 2:16,
      decimals = 2) %>% 
    tab_options(
      row_group.background.color = "#E9E0E1",
      row_group.font.weight = "bold"
    ) %>% 
    cols_width(everything() ~ px(50))
}

# Monthly table
vis_site_glance_monthly %>% 
  bind_rows(all_sites_glance_monthly,
            all_sites_all_vis_glance_monthly,
            all_vis_glance_monthly) %>% 
  select(site, index, r.squared, mae, rmse) %>%
  pivot_wider(names_from = index, 
              values_from = c(r.squared, mae, rmse)) %>%
  create_metrics_table()


# Weekly table
vis_site_glance_weekly  %>%
  bind_rows(all_sites_glance_weekly,
            all_sites_all_vis_glance_weekly,
            all_vis_glance_weekly) %>% 
  select(site, index, r.squared, mae, rmse) %>%
  pivot_wider(names_from = index, 
              values_from = c(r.squared, mae, rmse)) %>%
  create_metrics_table()

# Daily table
vis_site_glance_daily %>%
  bind_rows(all_sites_glance_daily,
            all_sites_all_vis_glance_daily,
            all_vis_glance_daily)  %>% 
  select(site, index, r.squared, mae, rmse) %>%
  pivot_wider(names_from = index, 
              values_from = c(r.squared, mae, rmse)) %>%
  create_metrics_table()
```

```{=tex}
\endgroup
```
```{r lm_barplot_metrics}
#| label: fig-lm_metrics
#| fig-cap: "Summary of Linear models for GPP estimation using the vegetation indices on a monthly (a), weekly (b), and daily (c) basis. MAE and RMSE metrics units are gC m⁻² d⁻¹"
#| fig-width: 7
#| fig-height: 9
#| echo: false
#| message: false
#| warning: false
library(cowplot)

# Create a function to generate the plot
create_metrics_plot <- function(timescale, y_var, ylim_range) {
  get(paste0("vis_site_glance_", timescale)) %>% 
    select(site, index, {{ y_var }}) %>%
    bind_rows(get(paste0("all_sites_glance_", timescale)),
               get(paste0("all_sites_all_vis_glance_", timescale)),
              get(paste0("all_vis_glance_", timescale))) %>% 
    ggplot(aes(x = site, y = .data[[y_var]], fill = index)) +
    geom_col(position = "dodge") +
    coord_cartesian(ylim = ylim_range) +
    scale_fill_viridis_d() +
    labs(x = "", fill = "Index") +
    theme_minimal_hgrid(font_size = 12) +
    theme(axis.text.x = element_text(angle = 45, h = 1))
}

# Response variables are going to be the same:
response_vars <- c("r.squared", "mae", "rmse")

# Monthly plots
monthly_metrics_plots <- map2(response_vars, 
     list(c(0.3, 1), c(0.5, 4), c(0.5, 5)), 
     ~ create_metrics_plot("monthly", .x, .y))

# Weekly plots
weekly_metrics_plots <- map2(response_vars, 
              list(c(0.3, 1), c(0.5, 4), c(0.5, 5)),
              ~ create_metrics_plot("weekly", .x, .y))

# Daily plots
daily_metrics_plots <- map2(response_vars, 
              list(c(0.3, 1), c(0.5, 4), c(0.5, 5)),
              ~ create_metrics_plot("daily", .x, .y))

# Grid the plots as should go in the chapter
lm_metrics_plots <- plot_grid(
  monthly_metrics_plots[[1]] + labs(y = expression(R^{"2"})) + theme(legend.position = "none"),
  weekly_metrics_plots[[1]] +  labs(y = expression(R^{"2"})) + theme(legend.position = "none"),
  daily_metrics_plots[[1]] +   labs(y = expression(R^{"2"})) + theme(legend.position = "none"),
  monthly_metrics_plots[[2]] + labs(y = "MAE") + theme(legend.position = "none"),
  weekly_metrics_plots[[2]] +  labs(y = "MAE") + theme(legend.position = "none"),
  daily_metrics_plots[[2]] +   labs(y = "MAE") + theme(legend.position = "none"),
  monthly_metrics_plots[[3]] + labs(y = "RMSE") + theme(legend.position = "none"),
  weekly_metrics_plots[[3]] +  labs(y = "RMSE") + theme(legend.position = "none"),
  daily_metrics_plots[[3]] +   labs(y = "RMSE") + theme(legend.position = "none"),
  nrow = 3,
  ncol = 3,
  labels = c("A", "B", "C"))

plot_legend <- get_legend(
  monthly_metrics_plots[[1]] + 
    guides(color = guide_legend(nrow = 3)) 
)

plot_grid(lm_metrics_plots, plot_legend, ncol = 2, rel_widths = c(1, .1))
rm(plot_legend)
```

### Analysis of GPP-Vegetation Index Relationships Using GAM Models

In @tbl-gam_model_results, we present a summary of the results obtained from the
GAM models used for GPP estimation at each site, employing the vegetation
indices as predictors. To compare between the models, we include in the table
relevant model summary statistics, such as R², MAE, and RMSE. Furthermore,
additional metrics such as the p-value, the F statistic (f), effective degrees
of freedom (edf), and the Akaike Information Criterion (AIC), are displayed in
@tbl-gam_daily_model_results_complete_a, and
@tbl-gam_daily_model_results_complete_b for daily models,
@tbl-gam_weekly_model_results_complete_a, and
@tbl-gam_weekly_model_results_complete_b for weekly outputs, and for monthly
results the @tbl-gam_monthly_model_results_complete_a, and
@tbl-gam_monthly_model_results_complete_b. A residuals distribution for each of
the models is in @fig-gam_residuals

When using GAM models on a monthly basis, no single VI demonstrates consistent
superiority over the others. For the all sites category, CCI has an R² = 0.03
better variance explanation than NDVI, which is the second best model with an R²
= 0.75. In the case of Michigan EVI and NIRv were the best individual indices
with an R² = 0.96 of the variance in GPP, but EVI had slightly lower error metrics
values in MAE ($0.01 \, \mathrm{gC \, m^{-2} \, d^{-1}}$) and RMSE
($0.02 \, \mathrm{gC \, m^{-2} \, d^{-1}}$). Bartlett had a better model
performance when using CCI with an R² = 0.91 of variance explained with the lowest
error metrics among all the GAM monthly models. It's important to highlight that
for Michigan and Bartlett, implementing a GAM model using all the VIs as
covariates posed challenges due to limited observations for the model
parameters, raising concerns of potential overfitting.

Finally, it's noteworthy that NDVI on a monthly basis displayed suboptimal
performance in two of the sites (Michigan and Bartlett) with a difference of R²
= 0.1 and R² = 0.13 with the best performing models respectively, while EVI
exhibited less favourable results in just the Borden site with MAE
$1.92 \, \mathrm{gC \, m^{-2} \, d^{-1}}$) and RMSE
$2.54 \, \mathrm{gC \, m^{-2} \, d^{-1}}$).

On a weekly basis, models incorporating all VIs as covariates consistently
obtained better performance compared to any individual VI, irrespective of the
site. Specifically when evaluating the all sites category, the inclusion of all
indices as covariates yielded an R² = 0.07 increase in variance explanation
compared with the best individual VI result CCI. Further, for Michigan, this
improvement amounted to R² = 0.02 compared to EVI, R² = 0.01 for Bartlett in
contrast to EVI, and an R² = 0.02 enhancement for Borden when compared with EVI.

Conversely, NDVI showed a diminished performance as an individual index when
compared with all the other individual VIs. In the context of all sites, NDVI
yielded R² = 0.07 less variance explanation than CCI. Notably, for Michigan,
NDVI's performance lagged by R² = 0.17 compared to EVI, Bartlett an R² = 0.18
reduction relative to EVI, and Borden exhibited an R² = 0.09 deficit when
contrasted with EVI.

On a daily basis, when considering the all sites category, the model with all
the VIs as covariates explained R² = 0.04 more variance in GPP when compared
with the best performing individual VI CCI. In the case of Bartlett, the
increase was also R² = 0.04 but in this case, the best individual performing VI
was EVI. For Michigan, the model using all VIs as covariates outperformed EVI by
R² = 0.05 in GPP prediction, and for Borden, it was an R² = 0.09 improvement
when compared with NIRv.

Among the individual VIs, NDVI consistently demonstrated the poorest performance
across all four cases. As an individual VI, EVI performed better in Bartlett,
Michigan, and Borden. However is worth noting that for Borden the variance
explained was limited to R² = 0.5.

In summary, among the three individual sites, Bartlett consistently produced the
most favourable results in terms of models explaining variance and yielding
lower residuals, followed by Michigan. In the case of Borden, when employing
individual indices, the models struggled to achieve a variance explanation
exceeding R² = 0.5.

<!-- ### Daily and weekly GPP and VIs relations -->

<!-- **TODO: Single vs Individual GAM model** -->

<!-- *This is a section under construction. Several models are presented in order -->

<!-- to discuss which one will be better to describe the patterns found* -->

<!-- Richard's suggestion is to create a single model using all indices as covariates. -->

<!-- The notes are described also in the issue [Ref #49](https://github.com/ronnyhdez/thesis_msc/issues/49) -->

<!-- **Single model** -->

<!--  - Using all the indices that I have as multiple predictors will allow me to  -->

<!--  explore the relation between GPP and each VI while also considering potential -->

<!--  interactions. But this will be an examination of the combined effects of the  -->

<!--  VIs on GPP and determine their relative importance. -->

<!-- **Individual model** -->

<!--  - A separate GAM model for each VI will allow me to explore the relation  -->

<!--  between GPP and each VI separately, without the exploration of potential  -->

<!--  interactions between the VIs.  -->

<!--  - But, do I want to explore if there are interactions? If there is an  -->

<!--  interaction, what would be the meaning of an interaction between indices? -->

<!--  - With the individual GAM models, I think that I can uncover individual trends, that otherwise would be hidden among all the VIs. [**simpson's paradox**](https://en.wikipedia.org/wiki/Simpson%27s_paradox) -->

```{=tex}
\begingroup
\setlength{\LTleft}{0pt minus 500pt}
\setlength{\LTright}{0pt minus 500pt}
\fontsize{5pt}{7pt}\selectfont
\addtolength{\tabcolsep}{-3pt}
```
```{r}
#| label: tbl-gam_model_results
#| tbl-cap: "Summary of GAM models for GPP estimation using the vegetation indices on a monthly (a), weekly (b), and daily (c) basis.  MAE and RMSE metrics units are gC m-⁻² d-⁻¹" 
#| echo: false
#| message: false
#| warning: false
source("scripts/gam_preparation.R")

create_metrics_table <- function(data) {
  data %>% 
    pivot_wider(names_from = index,
                values_from = c(rsq, mae, rmse)) %>% 
    gt() %>%
    tab_spanner(label = md("NDVI"), columns = contains("NDVI")) %>%
    tab_spanner(label = "EVI", columns = contains("EVI")) %>%
    tab_spanner(label = "NIRv", columns = contains("NIRv")) %>%
    tab_spanner(label = "CCI", columns = contains("CCI")) %>% 
    tab_spanner(label = "All", columns = contains("All")) %>% 
    cols_label(
      .list = list(
        "site" = "Site",
        "rsq_evi_mean" = "R2",
        "mae_evi_mean" = "MAE",
        "rmse_evi_mean" = "RMSE",
        "rsq_ndvi_mean" = "R2",
        "mae_ndvi_mean" = "MAE",
        "rmse_ndvi_mean" = "RMSE",
        "rsq_nirv_mean" = "R2",
        "mae_nirv_mean" = "MAE",
        "rmse_nirv_mean" = "RMSE",
        "rsq_cci_mean" = "R2",
        "mae_cci_mean" = "MAE",
        "rmse_cci_mean" = "RMSE",
        "rsq_All" = "R2",
        "mae_All" = "MAE",
        "rmse_All" = "RMSE"
      )
    ) %>% 
    cols_align(
      align = "center",
      columns = 2:16
    ) %>% 
    fmt_number(
      columns = 2:16,
      decimals = 2) %>% 
    tab_options(
      row_group.background.color = "#E9E0E1",
      row_group.font.weight = "bold"
    ) %>% 
    cols_width(everything() ~ px(50)) 
}

# Monthly table
all_sites_gam_monthly %>% 
  bind_rows(vis_sites_gam_monthly,
            all_sites_all_vis_gam_monthly,
            all_vis_gam_monthly,
  ) %>% 
  create_metrics_table()

# Weekly table
all_sites_gam_weekly %>% 
  bind_rows(vis_sites_gam_weekly,
            all_sites_all_vis_gam_weekly,
            all_vis_gam_weekly,
  ) %>% 
  create_metrics_table()

# Daily table
all_sites_gam_daily %>% 
  bind_rows(vis_sites_gam_daily,
            all_sites_all_vis_gam_daily,
            all_vis_gam_daily,
  ) %>% 
  create_metrics_table()
```

```{=tex}
\endgroup
```
```{r gam_barplot_metrics}
#| label: fig-gam_metrics
#| fig-cap: "Summary of GAM models for GPP (gC m-² d-¹) estimation using the vegetation indices. Column A represents the metrics for the monthly models, B the weekly, and C the daily metrics. MAE and RMSE metrics units are gC m⁻² d⁻¹"
#| fig-width: 7
#| fig-height: 9
#| echo: false
#| message: false
#| warning: false

# Create a function to generate the plot
create_metrics_plot <- function(timescale, y_var, ylim_range) {
  get(paste0("all_sites_gam_", timescale)) %>% 
    # select(site, index, {{ y_var }}) %>%
    bind_rows(get(paste0("vis_sites_gam_", timescale)),
              get(paste0("all_sites_all_vis_gam_", timescale)),
              get(paste0("all_vis_gam_", timescale))) %>% 
    mutate(index = case_when(
      index == "evi_mean" ~ "EVI",
      index == "ndvi_mean" ~ "NDVI",
      index == "nirv_mean" ~ "NIRv",
      index == "cci_mean" ~ "CCI",
      .default = index
    )) %>% 
    ggplot(aes(x = site, y = .data[[y_var]], fill = index)) +
    geom_col(position = "dodge") +
    coord_cartesian(ylim = ylim_range) +
    scale_fill_viridis_d() +
    labs(x = "", fill = "Index") +
    theme_minimal_hgrid(font_size = 12) +
    theme(axis.text.x = element_text(angle = 45, h = 1))
}

# Response variables are going to be the same:
response_vars <- c("rsq", "mae", "rmse")

# Monthly plots
monthly_metrics_plots <- map2(response_vars, 
     list(c(0.3, 1), c(0.5, 4), c(0.8, 5)), 
     ~ create_metrics_plot("monthly", .x, .y))

# Weekly plots
weekly_metrics_plots <- map2(response_vars, 
              list(c(0.3, 1), c(0.5, 4), c(0.8, 5)),
              ~ create_metrics_plot("weekly", .x, .y))

# Daily plots
daily_metrics_plots <- map2(response_vars, 
              list(c(0.3, 1), c(0.5, 4), c(0.8, 5)),
              ~ create_metrics_plot("daily", .x, .y))

# Grid the plots as should go in the chapter
gam_metrics_plots <- plot_grid(
  monthly_metrics_plots[[1]] + labs(y = expression(R^{"2"})) + theme(legend.position = "none"),
  weekly_metrics_plots[[1]] +  labs(y = expression(R^{"2"})) + theme(legend.position = "none"),
  daily_metrics_plots[[1]] +   labs(y = expression(R^{"2"})) + theme(legend.position = "none"),
  monthly_metrics_plots[[2]] + labs(y = "MAE") + theme(legend.position = "none"),
  weekly_metrics_plots[[2]] +  labs(y = "MAE") + theme(legend.position = "none"),
  daily_metrics_plots[[2]] +   labs(y = "MAE") + theme(legend.position = "none"),
  monthly_metrics_plots[[3]] + labs(y = "RMSE") + theme(legend.position = "none"),
  weekly_metrics_plots[[3]] +  labs(y = "RMSE") + theme(legend.position = "none"),
  daily_metrics_plots[[3]] +   labs(y = "RMSE") + theme(legend.position = "none"),
  nrow = 3,
  ncol = 3,
  labels = c("A", "B", "C"))

plot_legend <- get_legend(
  monthly_metrics_plots[[1]] + 
    guides(color = guide_legend(nrow = 3)) 
)

plot_grid(gam_metrics_plots, plot_legend, ncol = 2, rel_widths = c(1, .1))
rm(plot_legend)
```

### LM vs GAM

Monthly GAM applied in the context of Michigan exhibited a better performance
when employing EVI and NIRv in comparison to LMs with all vegetation indices as
covariates. Regarding weekly GAM models, they demonstrated improved performance
compared to LM across all sites, except for Borden, where both models yielded
equivalent R² values. Notably, despite similar R² values, the GAM model
consistently displayed lower error metrics, suggesting its capacity to better
accommodate potential nonlinear relationships and produce more accurate
predictions. This adaptability was particularly evident when addressing the
daily, weekly, or monthly variations in GPP. Furthermore, it is crucial to
highlight that monthly models, both in LM and GAM frameworks, consistently
exhibited superior metrics compared to their weekly and daily counterparts, as
illustrated in @fig-gam_metrics and @fig-lm_metrics. This can be an effect of
reducing variance when summarizing values as it is shown in the daily
@fig-gpp_vi_relation_daily, weekly @fig-gpp_vi_relation_weekly, and monthly
@fig-gpp_vi_relation_monthly

```{r gpp_vi_realtions_all_sites}
#| label: fig-gpp_vi_relation_daily
#| fig-cap: "Scatterplot of MODIS 500m derived VIs and GPP with daily values. Every observation corresponds to the observed GPP from a flux tower site. Total observations corresponds to the number of observations used to obtain the mean of the vegetation index (NDVI, NIRv, CCI and EVI). The red line indicates the GAM fit."
#| fig-width: 7
#| fig-height: 5
#| echo: false
#| message: false
#| warning: false

## Make sure the source of file "scripts/trend_plots.R" was successful
etiquetas <- c(
  "evi_mean" = "EVI",
  "ndvi_mean" = "NDVI",
  "nirv_mean" = "NIRv",
  "kndvi_mean" = "kNDVI",
  "cci_mean" = "CCI",
  "Borden" = "Borden",
  "Michigan" = "Michigan",
  "Bartlett" = "Bartlett"
)

# daily_grid <-
  daily_plot_500 %>% 
  filter(index != "kndvi_mean") %>% 
  # filter(ndvi_mean > 3 & gpp_dt_vut_ref < 20)
  mutate(month = lubridate::month(date, label = TRUE)) %>% 
  ggplot(aes(x = value, y = gpp_dt_vut_ref)) +
  geom_point(aes(color = month, size = total_obs), alpha = 0.8) +
  geom_smooth(method = "gam", se = FALSE, color = "#E20D6A") +
  scale_color_viridis_d() + 
  scale_fill_viridis_d(aes(month)) +
  facet_grid(site~index, scales = "free", labeller = as_labeller(etiquetas)) +
  scale_size(range = c(.1, 3), name = "Total observations") +
  scale_x_continuous(breaks = seq(-1, 1, by = .1)) +
  scale_y_continuous(breaks = seq(0, 38, by = 4)) +
  labs(x = "Index value", 
       y = expression(GPP~(gC~m^{"-2"}~d^-1)),
       color = "Month") +
  theme_classic(base_size = 10) +
  theme(axis.text.x = element_text(angle = 60, h = 1)) +
  guides(size = guide_legend(order = 1, keyheight = 0.1),
         col = guide_legend(order = 2, keyheight = 0.1)) +
  theme(strip.text = element_text(size = 10, color = "black"),
        strip.background = element_blank())
```

```{r gpp_vi_realtions_all_sites}
#| label: fig-gpp_vi_relation_weekly
#| fig-cap: "Scatterplot of MODIS 500m derived VIs and GPP with weekly values. Every observation corresponds to the observed GPP from a flux tower site. Total observations corresponds to the number of observations used to obtain the mean of the vegetation index (NDVI, NIRv, CCI and EVI). The red line indicates the GAM fit."
#| fig-width: 7
#| fig-height: 5
#| echo: false
#| message: false
#| warning: false
# weekly_grid <-
  weekly_plot_500 %>% 
  filter(index != "kndvi_mean") %>% 
  mutate(month = lubridate::month(date_start, label = TRUE)) %>% 
  ggplot(aes(x = value, y = gpp_dt_vut_ref)) +
  geom_point(aes(color = month, size = total_obs), alpha = 0.8) +
  geom_smooth(method = "gam", se = FALSE, color = "#E20D6A") +
  scale_color_viridis_d() + 
  scale_fill_viridis_d(aes(month)) +
  facet_grid(site~index, scales = "free", labeller = as_labeller(etiquetas)) +
  scale_size(range = c(.1, 3), name = "Total observations") +
  scale_x_continuous(breaks = seq(-1, 1, by = .1)) +
  scale_y_continuous(breaks = seq(0, 38, by = 4)) +
  labs(x = "Index value", 
       y = expression(GPP~(gC~m^{"-2"}~d^-1)),
       color = "Month") +
  theme_classic(base_size = 10) +
  theme(axis.text.x = element_text(angle = 60, h = 1)) +
  guides(size = guide_legend(order = 1, keyheight = 0.1),
         col = guide_legend(order = 2, keyheight = 0.1)) +
  theme(strip.text = element_text(size = 10, color = "black"),
        strip.background = element_blank())
```

```{r gpp_vi_realtions_all_sites}
#| label: fig-gpp_vi_relation_monthly
#| fig-cap: "Scatterplot of MODIS 500m derived VIs and GPP with monthly values. Every observation corresponds to the observed GPP from a flux tower site. Total observations corresponds to the number of observations used to obtain the mean of the vegetation index (NDVI, NIRv, CCI and EVI). The red line indicates the GAM fit."
#| fig-width: 7
#| fig-height: 5
#| echo: false
#| message: false
#| warning: false

# monthly_grid <-
  monthly_plot_500 %>% 
  filter(index != "kndvi_mean") %>% 
  mutate(month = lubridate::month(date, label = TRUE)) %>% 
  ggplot(aes(x = value, y = gpp_dt_vut_ref)) +
  geom_point(aes(color = month, size = total_obs), alpha = 0.8) +
  geom_smooth(method = "lm", se = FALSE, color = "#E20D6A") +
  scale_color_viridis_d() + 
  scale_fill_viridis_d(aes(month)) +
  facet_grid(site~index, scales = "free", labeller = as_labeller(etiquetas)) +
  scale_size(range = c(.1, 3), name = "Total observations") +
  scale_x_continuous(breaks = seq(-1, 1, by = .1)) +
  scale_y_continuous(breaks = seq(0, 18, by = 4)) +
  labs(x = "Index value", 
       y = expression(GPP~(gC~m^{"-2"}~d^-1)),
       color = "Month") +
  theme_classic(base_size = 10) +
  theme(axis.text.x = element_text(angle = 60, h = 1)) +
  guides(size = guide_legend(order = 1, keyheight = 0.1),
         col = guide_legend(order = 2, keyheight = 0.1)) +
  theme(strip.text = element_text(size = 10, color = "black"),
        strip.background = element_blank())

# plot_grid(daily_grid,
#           weekly_grid,
#           monthly_grid,
#           nrow = 3,
#           labels = c('A', 'B', 'C'))
```

\newpage

## Discussion

In summary, our LM results demonstrate that incorporating all VIs as covariates
in the model enhances predictive accuracy for GPP compared to models using
single VIs. This holds true across the three temporal scales and for each of the
sites individually or when considered as part of the same biotype. This
observation suggests that the relationship is really non-linear, and combining
different VIs allows to use of an LM to capture non-linear patterns throughout
the entire study duration, contrasting with the predictive power of any
individual VI. Additionally, VIs exhibit different noise sensitivities
[@zeng_optical_2022], which explains why using a single VI is insufficient to
capture nuanced variations in GPP.

Conversely, when aggregating all study sites indiscriminately to represent a
unified ecosystem type, model metrics exhibit diminished performance compared to
employing distinct models for each individual site. This decline in performance
may be attributed to the introduction of variability inherent in the Borden
site. The Borden site presents larger ranges of GPP values (see Figure
@fig-gpp_trends) than the Bartlett and Michigan sites. Incorporating the Borden
values together with those from the Bartlett and Michigan sites to represent a
single ecosystem type in a model introduces higher GPP values that contribute to
variability. This variability cannot be accurately tracked by the vegetation
indices, resulting in a reduction in RMSE and R2.

If such discrepancies arise when constructing models with identical
specifications for various sites categorized under the same biotype ecosystem,
it may indicate that, for global GPP models, exploring the use of diverse VIs is
crucial. An approach like this one could open the possibility for a more
accurate estimation of GPP by leveraging specific indices that might yield
superior results for each distinct type of ecosystem potentially contributing to
more robust and accurate global GPP estimations. For example, the study by
[@lin_evaluating_2019] showed that Chlorophyll Index Red (CRI) had a betterThis
hypothesis could be further elucidated through standardized processes for
calculating GPP with in-situ data across all sites with flux towers available; a
task currently in development by FLUXNET [@pastorello2020fluxnet2015].
Standardized GPP values obtained through this process could then be used to
train models and assess which VIs have the lowest errors when predicting GPP.

Considering the availability of data with daily, weekly, and monthly values, a
pertinent question arises about the temporal scale's impact on GPP estimation.
Our results indicate that models based on monthly data demonstrated better model
fit for each model and smaller residuals metrics. This superiority may be
attributed to reduced data variation, as daily values are aggregated into
monthly summaries, leading to values more centralized around a mean which can
reduce the measurement noise and concurrently mitigate the saturation effect of
VIs for daily peaks. The performance of model fits and residuals metrics
declined when transitioning from monthly to weekly and, subsequently, to daily
values (See @fig-lm_residuals for LM residuals and @fig-gam_residuals for GAM
residuals). Although the decrease in model fit was not substantial, the increase
in residuals metrics suggests a potential for larger errors in predictions.

While NDVI is one of the most used VIs in EO [@pabon-moreno_potential_2022], its
performance, as measured by the GPP RMSE, was comparatively less favorable on
two out of the three sites when compared to other indices. However, the
performance variation among the sites was not significant, with the Bartlett
experimental forest showing slightly better results in all GPP \~ VIs
relationships compared to the other sites. These findings suggest that, although
NDVI may not be the most optimal vegetation index for GPP estimation at these
specific sites, variations in performance across different locations are still
noteworthy.

When examining the data trends for GPP at each site and its relationship with
VIs, it becomes evident that at higher GPP values, the dispersion of the data is
more pronounced. This observed pattern may be ascribed to the inherent
limitation of these indices, which predominantly track the presence of green
leaves. It is important to note, however, that the mere presence of green leaves
does not consistently signify active photosynthesis. This discrepancy can arise
for two reasons. Firstly, the photosynthetic process may undergo temporary
suspension without any manifest changes in chlorophyll content or leaf
abscission [@camps-valls_unified_2021-2] Secondly, VIs relying on the NIR band
face challenges in detecting photosynthesis in instances when a higher amount of
healthy plant biomass is present. The increased biomass leads to greater
scattering and reflection of NIR radiation, resulting in a saturation effect
[@camps-valls_unified_2021-2].

This observation holds significance as, despite VIs serving as indicators of
vegetation capacity rather than vegetation physiology, historical data records
are predominantly derived from sensors equipped with bands for generating these
VIs. Conversely, newer sensors capable of capturing additional bands to derive
indices such as Solar-Induced Fluorescence (SIF) lack comprehensive historical
records, hindering the feasibility of long-term studies that extend back to
years preceding the 2000s.

The application of both LM and GAM models to the dataset revealed some
differences in model performance. The GAM model exhibited a slightly superior
performance in capturing the underlying patterns within the weekly and daily
data compared to the LM counterpart, but not in the monthly data. This was
evident from assessments of model fit and residuals metrics. The GAM model's
ability to flexibly capture non-linear relationships allowed for a more accurate
representation of the complex structure inherent in the data. Consequently, the
results suggest that the GAM framework may be more suitable for capturing the
nuances present in the dataset, emphasizing the importance of considering model
flexibility when analyzing non-linear relationships.

It is worth noticing that using a GAM model with only CCI as a predictor,
performs almost as well as the GAM model with all the VIs as covariates. This
could imply that the limitation in using a single VI with a LM arises from its
inability to capture the non-linear nature of the relationship. A selection of a
proficient VI with a non-linear model could better estimate GPP without the need
to create models with multiple predictors. In this scenario, the GAM model
with CCI to estimate GPP emerges as a potentially adequate choice among
generalized linear fits. However, this assessment does not address the question
of whether combining diverse models could offer more robustness, as such an
approach might implicitly incorporate spatial variability to a certain degree.

## Conclusions

In conclusion, our analysis demonstrates that incorporating all VIs as
covariates in our models yields a substantial improvement in predictive accuracy
for GPP compared to using any single VI. Additionally, our research highlights
the impact of time aggregation on prediction accuracy across different models.
Monthly LM models exhibit the best performance metrics, while weekly and daily
LM models present lower metrics, attributable to higher variability in
observations that makes tracking GPP challenging. A similar pattern is observed
when using GAM; however, weekly and daily GAM models outperform LM models.

Notably, no single VI emerges as the universal best predictor for every site or
time aggregation, but CCI with a GAM model emerges as a potentially adequate
choice among generalized linear fits. Moreover, the impact of large variations
in GPP ranges within a site is evident in the quality of predictions. These
variations introduce complexities and uncertainties, emphasizing the necessity
of accounting for local site characteristics and inherent heterogeneity when
aiming for accurate predictions of GPP. Growing seasons exhibit the most
pronounced variability, posing challenges for vegetation indices that saturate
at high biomass concentrations, making it more difficult to track changes in
GPP.

\newpage

## References

::: {#refs}
:::