forked from AdamWilsonLabEDU/2024-geo511-spatial-data-science-final-project-GEO511_QuartoProjectTemplate
-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathindex.qmd
309 lines (192 loc) · 12 KB
/
index.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
---
title: "Predicting the Distribution of an Enigmatic Genus of Moss"
author: Tianze Li
subtitle: GEO511 Final Project
date: today
date-format: long
code-fold: true
code-summary: "Show Me the Script"
knitr:
opts_chunk:
message: false
warning: false
---
# Introduction
*Takakia* (Figure 1) was discovered in the Himalayas in 1861. It is the oldest known extant genus of land plants, estimated to have branched off the other mosses around 390 million years ago. *Takakia* has rapidly evolving genes and is highly sensitive to environmental changes, particularly to temperatures during the growing season. Based on 20 years of monitoring *Takakia* on the Tibetan Plateau, scientists have discovered that due to climate change, including glacier melt and rising annual average temperatures, this species is facing the risk of extinction in the region. This highlights the urgent need for an effective conservation plan for *Takakia*.
However, due to *Takakia*'s small size and its primary growth between rocks, it is difficult to discover. Most research on *Takakia* focuses on the genetic response of *Takakia* to environmental changes. There is lack of studies that focus on distribution of *Takakia* on a global scale.
In this project, I aim to address this gap by utilizing global environmental data to predict suitable growing areas for *Takakia* worldwide. I hope that the results of the project will provide a reference for the establishment of conservation areas for *Takakia* in the future and the selection of Ex-situ conservation location.
![Figure 1](Takakia_lepidozioides.jpg)
# Materials and methods
## Data / Source
| Data / Source | Description / Link |
|---------------------------------------|--------------------------------------------------------------------------|
| **World Takakia occurrences record** | [Global Biodiversity Information Facility(GBIF)](https://www.gbif.org/occurrence/search?taxon_key=3229796) |
| **Takakia field samples** | Field study |
| **Historical climate data** | [19 Bioclimatic variables](https://www.worldclim.org/data/worldclim21.html) |
I used the above data to predict the global spatial distribution of *Takakia* using the Maximum Entropy(MaxEnt) method in R. The specific implementation code is as follows:
0. Install and Load Packages
```{r, massage=FALSE, warning=FALSE}
library(terra)
library(geodata)
library(rnaturalearth)
library(rgbif)
library(ggplot2)
library(rnaturalearthdata)
library(dismo)
library(sf)
library(rJava)
library(raster)
```
**1. Setup and Download Environmental data**
![Global Elevation](elevation_plot.png)
![Bio1: Annual Mean Temperature](bio1_plot.png)
```{r, eval=FALSE, cache = TRUE, message=FALSE, warning=FALSE}
# Specify the download path
download_path <- "E:/UB_Master/24fall/GEO511 Spatial/Final_project/Final_Project"
# Download elevation data (Elevation)
elevation <- worldclim_global(var = "elev", res = 2.5, path = download_path)
# Download Bioclimatic variables data
# WorldClim Bioclimatic variables (typically from bio1 to bio19)
bioclimatic_vars <- worldclim_global(var = "bio", res = 2.5, path = download_path)
# Check and visualize Bioclimatic variables data
print(bioclimatic_vars)
plot(bioclimatic_vars[[1]], main = "Bio1: Annual Mean Temperature")
# Combine the elevation data and Bioclimatic variables into one raster stack
climate_stack <- c(elevation, bioclimatic_vars)
# Check and visualize the merged raster stack
print(climate_stack)
plot(climate_stack[[1]], main = "Global Elevation")
```
**2. Crop and Mask Raster Stack for Land Areas**
![Bio12: Annual Precipitation](bio12_plot.png)
```{r, message=FALSE, eval=FALSE, cache = TRUE}
# Download global land shapefile
land <- ne_download(scale = "medium", type = "land", category = "physical", returnclass = "sf")
# Crop and mask raster stack to include only land areas
climate_stack_cropped <- crop(climate_stack, ext(land))
climate_stack_land <- mask(climate_stack_cropped, vect(land))
# Visualize the results
plot(climate_stack_land[[12]], main = "Annual Precipitation(Land Only)")
```
**3. Prepare Takakia Occurrence Data**
![Occurrences of Takakia](Takakia_Distribution_Plot.png)
```{r, eval=FALSE, cache = TRUE, , message=FALSE, warning=FALSE}
# Define the species name
species_name <- "Takakia S.Hatt. & Inoue"
# Retrieve occurrence data
gbif_data <- occ_search(scientificName = species_name, limit = 2000)
# Extract longitude and latitude from the data
occurrences <- data.frame(
lon = gbif_data$data$decimalLongitude,
lat = gbif_data$data$decimalLatitude
)
# Remove rows with missing coordinates
occurrences_clean <- na.omit(occurrences)
# Remove duplicate coordinates
occurrences_unique <- occurrences_clean[!duplicated(occurrences_clean), ]
# Check the number of unique points
cat("Number of unique occurrence points:", nrow(occurrences_unique), "\n")
# Load world map for background
world_map <- ne_countries(scale = "medium", returnclass = "sf")
# Plot unique occurrence points
ggplot() +
geom_sf(data = world_map, fill = "lightgray", color = "black") +
geom_point(data = occurrences_unique, aes(x = lon, y = lat), color = "red", size = 1) +
theme_minimal() +
labs(title = "Unique Distribution Points of Takakia S.Hatt. & Inoue",
x = "Longitude",
y = "Latitude")
# Save the unique occurrence data to a CSV file
write.csv(occurrences_unique, "Takakia_Unique_Distribution_Data.csv", row.names = FALSE)
```
**4. MaxEnt Model for Takakia Distribution Prediction**
```{r, eval=FALSE, cache = TRUE, , message=FALSE, warning=FALSE}
options(java.parameters = "-Xmx8g")
selected_env_stack <- climate_stack_land
# Ensure occurrences_unique is an sf object with coordinates
occurrences_sf <- st_as_sf(occurrences_unique, coords = c("lon", "lat"), crs = crs(selected_env_stack))
# Convert SpatRaster to RasterStack (if it's not already in stack format)
selected_env_stack <- stack(selected_env_stack)
# Extract coordinates from the sf object
occurrences_matrix <- as.matrix(st_coordinates(occurrences_sf))
# Extract environmental values at occurrence points
env_values <- extract(selected_env_stack, occurrences_matrix)
# Combine occurrences with extracted environmental data
occ_with_env <- cbind(occurrences_matrix, env_values)
# Remove rows with NA predictor values
occ_with_env <- na.omit(occ_with_env)
# Separate cleaned occurrences and predictor values
occurrences_matrix_clean <- occ_with_env[, 1:2] # First two columns are longitude and latitude
# Train the MaxEnt model (with all variables)
maxent_model <- maxent(
x = selected_env_stack,
p = occurrences_matrix_clean
)
# Summary of the model
summary(maxent_model)
# Predict the current distribution
current_distribution <- predict(maxent_model, selected_env_stack)
# Visualize the prediction
plot(current_distribution, main = "Predicted Current Distribution of Takakia")
# Save the predicted distribution as a GeoTIFF file
writeRaster(current_distribution, filename = "Takakia_Current_Distribution.tif", overwrite = TRUE)
# Split data into training and testing sets
set.seed(123)
train_indices <- sample(1:nrow(occurrences_matrix), size = 0.75 * nrow(occurrences_matrix))
train_data <- occurrences_matrix[train_indices, ]
test_data <- occurrences_matrix[-train_indices, ]
# Evaluate the model
eval <- evaluate(maxent_model, p = train_data, a = test_data, x = selected_env_stack)
print(eval)
# Plot the receiver operating characteristic (ROC) curve
plot(eval, "ROC")
# Perform Jackknife to evaluate the importance of each environmental variable
# Use the 'maxent' model with jackknife = TRUE
jackknife_results <- dismo::maxent(
x = selected_env_stack,
p = occurrences_matrix_clean,
jackknife = TRUE
)
# Extract Jackknife results from the 'maxent' object using the appropriate method
# Jackknife results are contained in the model object itself, accessible by using the 'maxent' method:
jackknife_importance <- jackknife_results@results
# Display the Jackknife importance values for each variable
print(jackknife_importance)
```
# Results
**Model Evaluation**
X.Training.samples 158.0000
Regularized.training.gain 2.9615
Unregularized.training.gain 3.1286
Iterations 500.0000
Training.AUC 0.9798
X.Background.points 10158.0000
Annual Mean Temperature. contribution 16.2349
Mean Temperature of Warmest Quarter. contribution 1.4220
Mean Temperature of Coldest Quarter. contribution 0.0000
Annual Precipitation. contribution 2.8552
Precipitation of Wettest Month. contribution 5.6789
Precipitation of Driest Month. contribution 26.9568
Precipitation Seasonality. contribution 1.5024
Precipitation of Wettest Quarter. contribution 0.0479
Precipitation of Driest Quarter. contribution 0.8141
Precipitation of Warmest Quarter. contribution 1.0436
Precipitation of Coldest Quarter. contribution 10.4287
Mean Diurnal Range. contribution 20.0025
Isothermality (BIO2/BIO7) (×100). contribution 1.0616
Temperature Seasonality. contribution 2.9947
Max Temperature of Warmest Month. contribution 2.6196
Min Temperature of Coldest Month. contribution 0.0764
Temperature Annual Range (BIO5-BIO6). contribution 5.0906
Mean Temperature of Wettest Quarter. contribution 0.2649
Mean Temperature of Driest Quarter. contribution 0.1747
Global Elevation. contribution 0.7306
![Predicted Current Distribution of Takakia](Predicted Current Distribution of Takakia.png)
![AUC](ROC.png)
# Conclusions
My project utilized environmental variables to predict the potential global distribution of *Takakia*. The results indicated that Annual Mean Temperature, Precipitation of the Driest Month, and Mean Diurnal Range were the most influential environmental variables affecting the distribution of *Takakia*. However, during the experiment, I noticed that the AUC values of the model's predictions were not ideal. Despite trying various methods to improve the AUC, the results fluctuated between 0.501 and 0.520 without significant improvement. So far, I am still exploring ways to enhance the model's predictive accuracy. I suspect that two factors might be limiting the model's performance: 1, The resolution of environmental variables. 2, The correlations among environmental variables. In the future, I will continue working on addressing these two issues and strive to improve the model's accuracy.
# References
1.Hattori, S., Iwatsuki, Z., Mizutani, M., & Inoue, S. (1974). Speciation of Takakia. The Journal of the Hattori Botanical Laboratory, 38, 115-121.
2.Hu, R., Li, X., Hu, Y., Zhang, R., Lv, Q., Zhang, M., ... & He, Y. (2023). Adaptive evolution of the enigmatic Takakia now facing climate change in Tibet. Cell, 186(17), 3558-3576.
3.Elith, J., & Leathwick, J. R. (2009). Species distribution models: ecological explanation and prediction across space and time. Annual review of ecology, evolution, and systematics, 40(1), 677-697.
4.Takakia. In Wikipedia. Retrieved December 5, 2024, from https://en.wikipedia.org/wiki/Takakia.