Project_Markdown.Rmd

---
title: "Comparing GCAM soil organic carbon predictions to empirical land use results"
output: bookdown::html_document2
date: "`r format(Sys.time(), '%d %B %Y')`"
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
# Abstract
Much of the world's carbon is sequestered in soil. Changes in land use greatly impact whether this  carbon remains stored or if it is emitted back into the atmosphere as carbon dioxide, which as broad climate change and global warming implications. These changes to the carbon cycle and their effects are modeled by many Integrated Assessment Models (IAMs). One notable IAM is the Global Change Analysis Model (GCAM). GCAM uses a highly simplified process to represent soil carbon changes, which has not yet been compared to experimental process. This study compared data from two papers aggregating experimental SOC data from previous studies to GCAM's soil carbon output data to determine the accuracy of GCAM's modeling. A meta-analysis was used to determine if such a difference existed and whether the difference was consistent across all tested geographical areas. We found that both the experimental rates of SOC change and the turnover rate constant differed significantly from the same values outputted from GCAM. Specifically, we found that GCAM significantly overestimates the rate of SOC turnover during land use transition. These results highlight the importance of updating how GCAM models soil carbon content change during land use transition.

# Introduction

There has been ample research on empirical soil organic carbon (SOC) levels worldwide, and several papers have compiled the results of these studies (Post  & Kwon 200, Wei et al. 2014). Specifically, there are many measurements of the rate of SOC change during land use transition. SOC measurements have also been made for many different types of land use transitions, providing a wide range of measurements across different environmental conditions. However, there has not yet been an effort to directly compare these measured SOC rates to those projected by Integrated Assessment Models (IAMs) . The factors that influence these rates are highly varied and complex, leading to high levels of uncertainty (Sulman et al. 2018). This highlights a need to check how SOC rates are modeled against experimental results.  

\newline
SOC storage has wide ranging climate impacts: roughly 25% of anthropogenic carbon emissions are absorbed by land ecosystems (Reuhr et al. 2023). Additionally, land use modification is a potential strategy to limit atmospheric CO2 and reduce the overall need to lower carbon emissions (Wise et al. 2009). For these reasons, accurate modeling and predicting of SOC levels and change is important to correctly gauge the effectiveness of various climate change mitigation strategies. 

\newline
One  of the IAMs that models SOC is the Global Change Assessment Model (GCAM ). GCAM is an open source, multi-sector dynamics model developed primarily by Pacific Northwest National Laboratory's Joint Global Change Research Institute. GCAM represents the behaviors of and interactions between earth's energy and water systems, the climate, and the global economy to answer "what if" scenarios about future global change. A major component of GCAM's analysis is land use worldwide. GCAM splits the world into 384 land use regions determined by the intersection of river basins and sociopolitical boundaries. Each of these 384 land use regions is then assigned different proportions of up to 8 different land types based on economic factors. These proportions change with time, allowing GCAM to model land use change based on economic, environmental, and sociopolitical factors.  

\newline
One of the parameters measured by GCAM is the soil organic carbon (SOC) content of soil. SOC arises from decomposing plants and animals that are broken down by microbes. While some of the carbon from this organic matter is emitted back into the atmosphere, much of it remains in the soil. However, many factors can cause SOC to be emitted back into the atmosphere even after it has already been sequestered. One primary cause of this change is land use transition. For example, when grassland soil is tilled for conversion to farmland, much of the surface soil is disturbed and exposed to aeration, emitting CO2. In GCAM, SOC change during land use transition is solely dependent on land use and region. When land use transition occurs, the change in SOC content is simply modeled as a [shape ] change between the two SOC values for the initial and final land use over a prescribed number of years, called the soil timescale. Each of GCAM's 32 geopolitical regions have a set soil timescale, ranging from 25 years (for the United States) to 200 years (Russia). While this kind of simplified modeling approach can be accurate, GCAMs approach should still be tested  (Menichetti et al. 2019).   The origin of the soil timescale values is also unclear (?)  and this project aims to help determine their accuracy.

# Methods

To begin the analysis, SOC data was taken from a basic run of GCAM, called a reference scenario. This models “business as usual” for the world, without any extra climate policies and assuming a certain level of overall warming . Table S1 contains all relevant data from this reference scenario used in our analysis, which was performed using R. Note that the SOC data, GCAM's soil timescale list, as well as both the geopolitical regions list and the land use regions list are separate files, and must be merged before any additional steps can be performed. Below is the code used to load the data and libraries, as well as to merge the four files listed above. The final product is a data set where each SOC data point has an associated timescale value, land use region, and geopolitical region.

```{r Read-In-GCAM-Data, message=FALSE}
library(dplyr)
library(ggplot2)
library(metafor)

#GCAM's SOC data
soilC <- read.csv(file = 'Data/GCAM_soilC.csv')

#GCAM's soil timescale list
read.csv(file = 'Data/soil_timescales.csv') %>%
  mutate(GCAM_region_ID = seq(1,32,by=1))-> timescales

#GCAM's land use region list 
# answered below [what does GLU stand for?????]
#Geographic Land Unit
#https://gcims.pnnl.gov/modeling/moirai-land-data-amalgamation-gcams-aglu-module

glus <- read.csv(file = 'Data/GLU_codes.csv')

#GCAM's geopolitical region list
regions <- read.csv('Data/GCAM_regions.csv')

#Merging the four above data sets

#flagging
#issue here for me -> "GCAM_region_ID" does not exist in original file versions


soilC %>%
  mutate(GLU_code = GLU) %>%
  right_join(glus, by='GLU_code') %>%
  right_join(regions, by='GCAM_region_ID') %>%
  right_join(timescales, by='GCAM_region_ID')-> soilC_regions
```

Much of the data contained within the SOC data set is irrelevant to this analysis, so it was removed.

```{r Select-relevant-data}
soilC_regions %>%
  select(Land_Type, soil_c, GLU_code, soilTimeScale, GCAM_region_ID, Basin_long_name) -> simple_soilC_regions
```

The experimental data compared to GCAM's outputs was taken from Post & Kwon (2000) and Wei et al. (2014), both reviews of  SOC measurements from the past 70 years. This data was manually digitized into a CSV file for analysis. This data did not explicitly list site locations for each measurement, so this information was determined from the paper referenced by the two studies. Once the location was determined, it was compared to GCAM's 384 land use regions using the rmap tool and GCAM's basin code mapping file. Then, the GLU code was manually added to the digitized data from Post & Kwon.  

Once fully digitized, the experimental data was loaded into R as well. Note that there were several entries with missing timeframes or where the location of the sample could not be determined. These were assigned a value of 'NA'. 

```{r Read-In-Experimental-Data}
PostKwon <- read.csv(file= 'Data/Experimental Data.csv', na.strings = c("", "NA"))
Wei <- read.csv(file= 'Data/Wei et al Data.csv', na.strings = c("", "NA"))

```

Once the experimental data was loaded in, each observation was mapped to is corresponding GCAM data point. This was done by simply adding the previously created GCAM data set to the experimental one by land use region, geopolitical region, and land use. This process was done twice--once for the initial land use and once for the final land use. This meant that, ultimately, each experimental data point had two associated GCAM SOC values--one for the initial land use, and one for the final land use.  

This then allowed us to compute the GCAM SOC rate during land use change. This was done by simply subtracting the initial SOC value from the final value and dividing that by GCAMs soil timescale parameter. Finally, the k value (explained in depth in Wei et al.) was computed for both the experimental and GCAM data. The experimental and GCAM k values were computed using equations 1 and 2, respectively, below.
\begin{equation}
\tag{1}
k = -\frac{ln \left(\frac{C_f}{C_0}\right)}{t}
\end{equation}

\begin{equation}
\tag{2}
k = -\frac{ln (Rate \times t + 1)}{t}
\end{equation}

\newline
Experimental data from Wei et al. did not contain any raw rate information. Instead, the study only reported percent decrease in SOC concentration. This means we were only able to compare k values between the Wei et. al. data and GCAM. Additionally, k had to be computed with a modified equation, 3, below. 

\begin{equation}
\tag{3}
k = -\frac{ln \left(\frac{1}{\frac{|OC Decrease|}{100} + 1}\right)}{t}
\end{equation}

The code for merging the GCAM and experimental, as well as for computing the rate and k value is below. The process was the same for the Wei et al. data. 

```{r create_comparison_Post_Kwon}
PostKwon %>%
  select(Initial_Land_Use, Final_Land_Use, GLU_code, GCAM_region_ID, Time, Exp_Rate) %>%
  na.omit() %>%
  mutate(Land_Type = Initial_Land_Use) %>%
  right_join(simple_soilC_regions, by = c('GLU_code', 'Land_Type', 'GCAM_region_ID')) %>%
  rename(initial_soil_c = soil_c) %>%
  select(-Land_Type) %>%
  mutate(Land_Type = Final_Land_Use) %>%
  right_join(simple_soilC_regions, by = c('GLU_code', 'Land_Type', 'GCAM_region_ID')) %>%
  rename(final_soil_c = soil_c) %>%
  select(-Land_Type, -soilTimeScale.x, -Basin_long_name.x) %>%
  rename(soilTimeScale = soilTimeScale.y, Basin_long_name = Basin_long_name.y) %>%
  na.omit() %>%
  mutate(GCAM_Rate = (final_soil_c - initial_soil_c)/soilTimeScale, Rate_Difference = Exp_Rate - GCAM_Rate, 
         Exp_k = -log(abs(Exp_Rate)*Time +1)/Time,
         GCAM_k = -log(final_soil_c/initial_soil_c)/soilTimeScale, 
         source = 'Post & Kwon'
  ) %>%
  #This next line corrects the sign of Exp_k--we had to take the absolute value to avoid NaNs, so this accounts for that 
  #This did not have to be done for the GCAM_k because the argument inside the natural log will always be positive--it is a positive value divided by another positive value--so sign correction was not necesary. 
  mutate(Exp_k = ifelse(sign(Exp_k) == sign(Exp_Rate), Exp_k*(-1), Exp_k)) -> PostKwon_Comparison
```

Note that, because of how the k value is computed for the experimental data, the sign was often mismatched. The sign of k corresponds to the opposite of the sign of the rate, which had to be corrected manually.  


```{r create_comparison_Wei, echo = FALSE}
#Creating Wei comparison data
Wei %>%
  select(Initial_Land_Use, Final_Land_Use, GLU_code, GCAM_region_ID, OC_decrease, Time) %>%
  na.omit() %>%
  mutate(Land_Type = Initial_Land_Use) %>%
  right_join(simple_soilC_regions, by = c('GLU_code', 'Land_Type', 'GCAM_region_ID')) %>%
  rename(initial_soil_c = soil_c) %>%
  select(-Land_Type) %>%
  mutate(Land_Type = Final_Land_Use) %>%
  right_join(simple_soilC_regions, by = c('GLU_code', 'Land_Type', 'GCAM_region_ID')) %>%
  rename(final_soil_c = soil_c) %>%
  select(-Land_Type, -soilTimeScale.x, -Basin_long_name.x) %>%
  rename(soilTimeScale = soilTimeScale.y, Basin_long_name = Basin_long_name.y) %>%
  na.omit() %>%
  mutate(GCAM_Rate = (final_soil_c - initial_soil_c)/soilTimeScale,
         GCAM_k = -log(final_soil_c/initial_soil_c)/soilTimeScale,
         Exp_k = -log(1/((abs(OC_decrease)/100) +1))/Time,
         source = 'Wei et al'
  ) %>%
  #This next line corrects the sign of Exp_k--we had to take the absolute value to avoid NaNs, so this accounts for that 
  mutate(Exp_k = ifelse(sign(Exp_k) == sign(OC_decrease), Exp_k, Exp_k*(-1))) -> Wei_Comparison
```


\Newline
Once the experimental data was matched to its corresponding GCAM outputs, the two data sets (Post& Kwon and Wei et al.) were merged. Importantly, because the rate data does not exist for Wei et al., the rate data was not included in the aggregated data set. 

```{r Create-Full-Comparison}
Full_Comparison <- bind_rows(
  select(PostKwon_Comparison, -Exp_Rate, -Rate_Difference),
  select(Wei_Comparison, -OC_decrease)
  )

#Now, we want to create another data frame that has all the repeated regions between the two studies
Full_Comparison %>%
  mutate(source_short = ifelse(source == 'Post & Kwon', 'PostKwon', 'Wei')) %>%
  mutate(basin_source = paste(Basin_long_name, source_short)) %>%
  filter((paste(Basin_long_name, 'PostKwon') %in% basin_source) & (paste(Basin_long_name, 'Wei') %in% basin_source)) -> Duplicate_Comparison

counts = table(Duplicate_Comparison$basin_source)
 
Duplicate_Comparison %>%
  filter(
    (counts[paste(Basin_long_name, 'Wei')] > 1) & (counts[paste(Basin_long_name, 'PostKwon')] > 1)
  ) -> Duplicate_Comparison
```

Once  the k values were calculated, meta-analyses were performed on the data. Three separate analyses were performed--one on the Post & Kwon data, one on the Wei et al. data, and one on the aggregated data set. All three examined the k value data so the results from all three tests could be reasonably compared to one another. The Standardized Mean Difference effect size was used for the analyses. Specifically, Hedges' g was used to calculate the Standardized Mean difference once the data was grouped by GCAM land use region. This allowed us to determine if there was a pattern in any potential discrepancies between GCAM and experimental data, or if any variance was localized geographically and/or random.  

\newline
A fixed effect model was used for the analysis. This means that it is assumed that all the data being examined (ie all the SOC measurements) come from the same general population, and that all the data was collected in the same manner. This obviously applies to the GCAM data, as it is all from the same run of a basic GCAM scenario. Because data was reported the same way regardless of the collection location, the same assumptions apply to the Post & Kwon and Wei et al. data. 
```{r Meta-Analysis-full, warning=FALSE}
#Meta-analysis for the aggregated data set 
#First, we need to get mean, std dev, and n
Full_Comparison %>%
  group_by(Basin_long_name) %>%
  summarize(mean_control = mean(Exp_k), sd_control= sd(Exp_k), n_control = n(),
            mean_GCAM = mean(GCAM_k), sd_GCAM = sd(GCAM_k), n_GCAM = n()
  ) -> Full_MA_data

#Now, we can use escalc to get the standardized mean difference 
Full_effect_sizes <-
  escalc('SMD',
         m1i = mean_control, n1i = n_control, sd1i = sd_control,
         m2i = mean_GCAM, n2i = n_GCAM, sd2i = sd_GCAM,
         data = Full_MA_data
  )

#We're going to use a fixed effect model for this analysis, which we set up below
Full_fixed_effect_results <- rma(yi, vi, method = 'FE',
                                     slab = Basin_long_name,
                                     data = Full_effect_sizes)

```

```{r meta-analyses-individual, echo = FALSE, message=FALSE, warning=FALSE}
#Now, for the Post & Kwon rates
#First, we need to get mean, std dev, and n
PostKwon_Comparison %>%
  group_by(Basin_long_name) %>%
  summarize(mean_control = mean(Exp_Rate), sd_control= sd(Exp_Rate), n_control = n(),
            mean_GCAM = mean(GCAM_Rate), sd_GCAM = sd(GCAM_Rate), n_GCAM = n()
            ) -> PostKwon_MA_data

#Now, we can use escalc to get the standardized mean difference 
PostKwon_effect_sizes <-
  escalc('SMD',
    m1i = mean_control, n1i = n_control, sd1i = sd_control,
    m2i = mean_GCAM, n2i = n_GCAM, sd2i = sd_GCAM,
    data = PostKwon_MA_data
  )

#We're going to use a fixed effect model for this analysis, which we set up below
PostKwon_fixed_effect_results <- rma(yi, vi, method = 'FE',
                                     slab = Basin_long_name,
                                     data = PostKwon_effect_sizes)


#Now, we'll do one for the Post & Kwon k values
#First, we need to get mean, std dev, and n
PostKwon_Comparison %>%
  group_by(Basin_long_name) %>%
  summarize(mean_control = mean(Exp_k), sd_control= sd(Exp_k), n_control = n(),
            mean_GCAM = mean(GCAM_k), sd_GCAM = sd(GCAM_k), n_GCAM = n()
  ) -> PostKwon_MA_k_data

#Now, we can use escalc to get the standardized mean difference 
PostKwon_k_effect_sizes <-
  escalc('SMD',
         m1i = mean_control, n1i = n_control, sd1i = sd_control,
         m2i = mean_GCAM, n2i = n_GCAM, sd2i = sd_GCAM,
         data = PostKwon_MA_k_data
  )

#We're going to use a fixed effect model for this analysis, which we set up below
PostKwon_k_fixed_effect_results <- rma(yi, vi, method = 'FE',
                                     slab = Basin_long_name,
                                     data = PostKwon_k_effect_sizes)


#Now we do the same for Wei et al.
#First, we need to get mean, std dev, and n
Wei_Comparison %>%
  group_by(Basin_long_name) %>%
  summarize(mean_control = mean(Exp_k), sd_control= sd(Exp_k), n_control = n(),
            mean_GCAM = mean(GCAM_k), sd_GCAM = sd(GCAM_k), n_GCAM = n()
  ) -> Wei_MA_data

#Now, we can use escalc to get the standardized mean difference 
Wei_effect_sizes <-
  escalc('SMD',
         m1i = mean_control, n1i = n_control, sd1i = sd_control,
         m2i = mean_GCAM, n2i = n_GCAM, sd2i = sd_GCAM,
         data = Wei_MA_data
  )

#We're going to use a fixed effect model for this analysis, which we set up below
Wei_fixed_effect_results <- rma(yi, vi, method = 'FE',
                                     slab = Basin_long_name,
                                     data = Wei_effect_sizes)


#Now for the whole data set by transition type
#First, we need to create a column in the merged data for transition type 
Full_Comparison %>% 
  mutate(change = paste(Initial_Land_Use, Final_Land_Use, sep = '')) -> Full_Comparison_change

#Now, we need to get mean, std dev, and n
Full_Comparison_change %>%
  group_by(change) %>%
  summarize(mean_control = mean(Exp_k), sd_control= sd(Exp_k), n_control = n(),
            mean_GCAM = mean(GCAM_k), sd_GCAM = sd(GCAM_k), n_GCAM = n()
  ) -> Full_MA_data_change

#Now, we can use escalc to get the standardized mean difference 
Full_effect_sizes_change <-
  escalc('SMD',
         m1i = mean_control, n1i = n_control, sd1i = sd_control,
         m2i = mean_GCAM, n2i = n_GCAM, sd2i = sd_GCAM,
         data = Full_MA_data_change
  )

#We're going to use a fixed effect model for this analysis, which we set up below
Full_fixed_effect_results_change <- rma(yi, vi, method = 'FE',
                                 slab = change,
                                 data = Full_effect_sizes_change)
```


# Results
Overall, GCAMs rate and k value outputs matched experimental values. 
```{r Historgram_Rate-Post-Kwon, echo=FALSE, fig.cap="Histograms showing overlapping GCAM and experimental data for raw soilC rates during land use transition. Here, the experimental data is taken from Post & Kwon.", message=FALSE}
#Plot overlapping rate histograms for the different rate sources
ggplot() +
  geom_histogram(aes(x = PostKwon_Comparison$Exp_Rate, fill ='Post & Kwon' ), alpha = 0.5) +
  geom_histogram(aes(x = PostKwon_Comparison$GCAM_Rate, fill = 'GCAM Rate'), alpha = 0.5) +
  xlab(expression(Rate~(kg~C/m^2))) + ylab('Count') +
  scale_fill_manual(name = "Data Source", values = c('Post & Kwon' = '#3584B0', 'GCAM Rate'='#e3962b')) + 
  theme_light() 
```


```{r Histogram_K-Post-Kwon, echo=FALSE, fig.cap="Histograms showing overlapping GCAM and experimental data for computed k values during land use change. Here, the experimental data is taken from Post & Kwon.", message=FALSE}
#Plot overlapping k histograms for the different k sources
ggplot() +
  geom_histogram(aes(x = PostKwon_Comparison$Exp_k,fill ='Post & Kwon'), alpha = 0.5) +
  geom_histogram(aes(x = PostKwon_Comparison$GCAM_k,  fill = 'GCAM'), alpha = 0.5) +
  xlab(expression(k~(y^-1))) + ylab('Count') +
  scale_fill_manual(name = "Data Source", values = c('Post & Kwon' = '#3584B0', 'GCAM'='#e3962b')) +
  theme_light()
```

```{r Histogram_Wei, echo=FALSE, fig.cap="Histograms showing overlapping GCAM and experimental data for computed k values during land use change. Here, the experimental data is taken from Wei et al", message=FALSE}
#Plot overlapping k histograms for the different k sources
ggplot() +
  geom_histogram(aes(x = Wei_Comparison$Exp_k,fill ='Post & Kwon'), alpha = 0.5) +
  geom_histogram(aes(x = Wei_Comparison$GCAM_k,  fill = 'GCAM'), alpha = 0.5) +
  xlab(expression(k~(y^-1))) + ylab('Count') +
  scale_fill_manual(name = "Data Source", values = c('Post & Kwon' = '#3584B0', 'GCAM'='#e3962b')) +
  theme_light()
```

```{r k-Histogram-full, echo=FALSE, fig.cap="Histograms showing overlapping GCAM and experimental data for computed k values during land use change. Here, the experimental data is taken from both Post & Kwon and Wei et al.", message=FALSE}
#Plot overlapping k histograms for the different k sources
ggplot() +
  geom_histogram(aes(x = Full_Comparison$Exp_k, fill = Full_Comparison$source), alpha = 0.5, bins = 75) +
  geom_histogram(aes(x = Full_Comparison$GCAM_k,  fill = 'GCAM'), alpha = 0.5, bins = 75) +
  xlab(expression(k~(y^-1))) + ylab('Count') +
  scale_fill_manual(name = "Data Source", 
                    values = c('GCAM'='#e3962b', 'Wei et al' = '#45912c', 'Post & Kwon' = '#3584B0')) +
  theme_light() 
```


Several t-tests were also performed. One was performed on each of the different k value sources (Post & Kwon, Wei et al., and the aggregated data), and another was performed on the Post & Kwon rate data.   As can be seen below, the difference between means of the rate data sets was not statistically significant, but it was for the k values.  
```{r t-test-Post-Kwon}
t.test(PostKwon_Comparison$Exp_Rate, PostKwon_Comparison$GCAM_Rate, alternative = 'two.sided')

t.test(PostKwon_Comparison$Exp_k, PostKwon_Comparison$GCAM_k, alternative = 'two.sided')
```

```{r t-test-Wei}
t.test(Wei_Comparison$Exp_k, Wei_Comparison$GCAM_k, alternative = 'two.sided') 

```

```{r t-test-full}
t.test(Full_Comparison$Exp_k, Full_Comparison$GCAM_k, alternative = 'two.sided')
```

 
\newline
Results from the analysis on the Post & Kwon data: 
```{r meta-analysis-results-Post-Kwon}
PostKwon_k_fixed_effect_results
```
```{r Forest_Plot-Post-Kwon, echo = FALSE, fig.cap ='Forest plot detailing the meta analysis performed on the Post & Kwon SOC Data. The data was grouped by GCAM land region (represented by GLU code).' , message = FALSE}
#Forest plot for Post & Kwon k vals
forest(
  PostKwon_k_effect_sizes$yi, PostKwon_k_effect_sizes$vi,
  annotate = FALSE, showweights = TRUE, 
  header = c('Region', 'Weight         SMD [95% CI]'),
  slab = PostKwon_k_fixed_effect_results$slab,
  xlab = 'Standardized Mean Difference',
  #Below sets the size of study labels, shape of bars, and size of x labels 
  cex = .8, pch = 15, cex.lab = 1
)

#Adding the summary effect size
addpoly(
  PostKwon_k_fixed_effect_results, 
  col = 'orange', cex = 1, annotate = TRUE, mlab = 'Summary'
)

```

Results from the analysis on the Wei et al. data:

```{r  meta-analysis-results-Wei}
Wei_fixed_effect_results
```
```{r  Forest_Plot-Wei, echo = FALSE, fig.cap ='Forest plot detailing the meta analysis performed on the Wei et al. SOC Data. The data was grouped by GCAM land region (represented by GLU code).' , message = FALSE}
#Forest plot for Wei
forest(
  Wei_effect_sizes$yi, Wei_effect_sizes$vi,
  annotate = FALSE, showweights = TRUE, 
  header = c('Region', 'Weight         SMD [95% CI]'),
  slab = Wei_fixed_effect_results$slab,
  xlab = 'Standardized Mean Difference',
  #Below sets the size of study labels, shape of bars, and size of x labels 
  cex = .8, pch = 15, cex.lab = 1
)

#Adding the summary effect size
addpoly(
  Wei_fixed_effect_results, 
  col = 'orange', cex = 1, annotate = TRUE, mlab = 'Summary'
)
```

Results from the analysis on the data aggregated by transition type: 

```{r Forest_Plot_transition_type, echo = FALSE, fig.cap ='Forest plot detailing the meta analysis performed on the aggregated SOC Data. The data was grouped by GCAM land region (represented by GLU code).' , message = FALSE}
#Now, a meta analysis by transition type for the entire dataset
#First, we need to create a column in the merged data for transition type 
Full_Comparison %>% 
  mutate(change = paste(Initial_Land_Use, Final_Land_Use, sep = '')) -> Full_Comparison_change

#First, we need to get mean, std dev, and n
Full_Comparison_change %>%
  group_by(change) %>%
  summarize(mean_control = mean(Exp_k), sd_control= sd(Exp_k), n_control = n(),
            mean_GCAM = mean(GCAM_k), sd_GCAM = sd(GCAM_k), n_GCAM = n()
  ) -> Full_MA_data_change

#Now, we can use escalc to get the standardized mean difference 
Full_effect_sizes_change <-
  escalc('SMD',
         m1i = mean_control, n1i = n_control, sd1i = sd_control,
         m2i = mean_GCAM, n2i = n_GCAM, sd2i = sd_GCAM,
         data = Full_MA_data_change
  )

#We're going to use a fixed effect model for this analysis, which we set up below
Full_fixed_effect_results_change <- rma(yi, vi, method = 'FE',
                                 slab = change,
                                 data = Full_effect_sizes_change)

#Forest plot for full dataset
forest(
  Full_effect_sizes_change$yi, Full_effect_sizes_change$vi,
  annotate = TRUE, showweights = TRUE, 
  header = c('Transition Type', 'Weight         SMD [95% CI]'),
  slab = Full_fixed_effect_results_change$slab,
  xlab = 'Standardized Mean Difference',
  #Below sets the size of study labels, shape of bars, and size of x labels 
  cex = .8, pch = 15, cex.lab = 1 
)

#Adding the summary effect size
addpoly(
  Full_fixed_effect_results_change, 
  col = 'orange', cex = 1, annotate = TRUE, mlab = 'Summary'
)


```
```{r meta-analysis-results-Full}
Full_fixed_effect_results_change
```

A meta-analysis was also performed only on the regions that were examined by both Post & Kwon and Wei et al. 
```{r meta_analysis_duplicate_forest}
#Meta analysis for the shared regions
#First, we need to get mean, std dev, and n
Duplicate_Comparison %>%
  group_by(basin_source) %>%
  summarize(mean_control = mean(Exp_k), sd_control= sd(Exp_k), n_control = n(),
            mean_GCAM = mean(GCAM_k), sd_GCAM = sd(GCAM_k), n_GCAM = n()
  ) -> Duplicate_MA_data

#Now, we can use escalc to get the standardized mean difference 
Duplicate_effect_sizes <-
  escalc('SMD',
         m1i = mean_control, n1i = n_control, sd1i = sd_control,
         m2i = mean_GCAM, n2i = n_GCAM, sd2i = sd_GCAM,
         data = Duplicate_MA_data
  )

#We're going to use a fixed effect model for this analysis, which we set up below
Duplicate_fixed_effect_results <- rma(yi, vi, method = 'FE',
                                     slab = basin_source,
                                     data = Duplicate_effect_sizes)

#Forest plot for duplicates
forest(
  Duplicate_effect_sizes$yi, Duplicate_effect_sizes$vi,
  annotate = TRUE,showweights = TRUE,
  header = c('Region', 'Weight            SMD [95% CI]'),
  slab = Duplicate_fixed_effect_results$slab,
  xlab = 'Standardized Mean Difference',
  #Below sets the size of study labels, shape of bars, and size of x labels 
  cex = .8, pch = 15, cex.lab = 1
)

#Adding the summary effect size
addpoly(
  Duplicate_fixed_effect_results, 
  col = 'orange', cex = 1, annotate = TRUE, mlab = 'Summary'
)


```
```{r meta_analysis_duplicate_results}
Duplicate_fixed_effect_results
```


# Discussion

# Conclusion