-
Notifications
You must be signed in to change notification settings - Fork 11
/
Copy pathC00_coding.qmd
166 lines (111 loc) · 10.8 KB
/
C00_coding.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
---
title: "Coding"
format: html
---
# Background
Coding is the practice of writing instructions for computers to follow; computers aren't clever by themselves - they need to be told what to do. Most coding is text-based; people writing coding instructions into simple text documents. But some coding is [graphical or visual](https://en.wikipedia.org/wiki/Visual_programming_language). We shall be using text-based coding. We are going to use a free and open source general programming language called R. [R programming language](https://www.r-project.org/) has its roots in statistics and science, but it really can be used for anything.
In the early days, coding was pretty barebones - all one needed was a text editor and access to the programming language - no frills there, no pretty images, no buttons to push, just typing. As time passed, volunteers added niceties to the text editor, like visualizing plots of data, buttons to save files, colorized text for the typed code, and other bells and whistles. These editors became know as graphical user interfaces (GUI for short.) GUIs keep getting easier and easier for people to use. We will use the GUI known as [RStudio](https://posit.co/). It's best to think of GUIs as wrappers around the core programming language; they are really nice and pretty, but they can't do math. The programming language itself (which does do math!), evolved only as it needed to to fix bugs and make general improvements.
# One of many available GUIs: RStudio
RStudio is a free GUI that wraps around two languages (R and Python, and soon to be more). When you invoke RStudio you'll see that it is laid out as a multi-panel application that runs inside your browser. It will look something like this screenshot.
![Rstudio screenshot](images/Rstudio.png)
There are many [RStudio tutorials](https://duckduckgo.com/?q=introduction+to+rstudio&t=ffab&atb=v342-1&iax=videos&ia=videos&iai=https%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%3Dw8ooTMStQV0&pn=1) online. We encourage you to check them out.
# Getting the software and data you will need
Use this [wiki page](https://github.com/BigelowLab/ColbyForecasting2025/wiki/Courseware) to guide you through the process of installing the software you'll need for this course, "the courseware".
# Coding with R
There are [SO MANY TUTORIALS](https://duckduckgo.com/?t=ffab&q=introduction+to+R&atb=v342-1&ia=web), some are great and some not so much. [This one](https://intro2r.com/) is a pretty good baby-steps approach and is quite up-to-date. You may find another that you prefer - let us know what works for you.
Take time to look at the tutorials in our [wiki](https://github.com/BigelowLab/ColbyForecasting2025/wiki). Especially the one on [working with tabular data](https://github.com/BigelowLab/ColbyForecasting2025/wiki/tables).
There is so much to learning a programming language - it takes diving in to really get going. We'll try to give you enough so that you can continue learning on your own.
[A journey of a thousand miles begins with a single step](https://en.wikipedia.org/wiki/A_journey_of_a_thousand_miles_begins_with_a_single_step); so let's dive in.
## Loading the necessary tools
For any coding project you will need to access a select number of tools, often stored on your computer in what is called a package library (it's just a directory/folder really). When the package is loaded from the library, all of the functionality the author built in to that package is exposed for you to use in your project. We have created a single file that will both install (if needed) and load (if not already loaded) each of these packages. It's easy to run.
First, make sure that you have loaded the project (File > Open Project) if you haven't already. Then at the R console pane type the following...
```{r source_setup, warning = FALSE}
source("setup.R")
```
After a few moments the command prompt will return to focus. Be sure to run that command at the beginning of every new R session or anytime you are adding new functionality.
Now we are ready to load some data into your R session.
## Spatial data
Spatial data is any data that has been assigned to a location on a planet (or even between planets!); that means environmental data is mapped to locations on oblate spheroids (like Earth). The oblate spheroid shape presents interesting but challenging math to the data scientist. Modern spatial data is designed to make data science easier by handling all of the location information in a discrete and standardized manner. By discrete we mean that we don't have to sweat the details.
### Point data
Many spatial data sets come as point data - locations (longitude, latitude and maybe altitude/depth and/or time) with one or more measurements (temperature, cloudiness, probability of precipitation, abundance of fish, population density, etc) attached to that point. Here is an example of point data about long-term oceanographic monitoring buoys in the Gulf of Maine ("gom"). We'll read the buoy data into a variable, `buoy`. Next we can print the result simply by typing the name (or you could type `print(buoys)` if you like all the extra typing.)
```{r read_buoys}
buoys = gom_buoys()
buoys
```
:::{.callout-note}
You can get the online documention for functions a couple of ways. You can type `?name_of_function`, or or `help(name_of_function)`. Try `?gom_buoys` as an example.
Sometimes you need more - like seeing the function itself. You can always try typing the function name without any trailing parentheses.
```{r help}
gom_buoys
```
If that still doesn't work, we highly recommend trying [Rseek.org](https://rseek.org/) which is an R-language specific search engine.
:::
So there are 6 buoys, each with an attached attribute "name", "longname" and "id", as well as the spatial location datain the "geometry" column (just longitude and latitude in this case). We can easily plot these using the "name" column as a color key. For more on plotting spatial data, see this [wiki page](Spatial).
```{r plot_buoys}
plot(buoys['id'], axes = TRUE, pch = 16)
```
Well, that's pretty, but without a shoreline it lacks context.
### Linestrings and polygon data
Linestrings (open shapes) and polygons (closed shape) are much like point data, except that each geometry is linestring or polygon. We have a set of polygons/linestring that represent the coastline.
```{r coast}
coast = read_coastline()
coast
```
In this case, each record of geometry is a "MULTILINESTRING", which is a group of one or more linestrings. Note that no other variables are in this table - it's just the geometry.
Let's plot these geometries, and add the points on top.
```{r plot_coast_and_points, warning = FALSE}
plot(coast, col = "orange", lwd = 2, axes = TRUE, reset = FALSE,
main = "Buoys in the Gulf of Maine")
plot(st_geometry(buoys), pch = 1, cex = 0.5, add = TRUE)
text(st_geometry(buoys), labels = buoys$id, cex = 0.7, adj = c(1,-0.1))
```
### Array data (aka raster data)
Often spatial data comes in grids, like regular arrays of pixels. These are great for all sorts of data like satellite images, bathymetry maps and environmental modeling data. We'll be working with environmental modeling data which we call "Brickman data". You can learn more about [Brickman data in the wiki](https://github.com/BigelowLab/ColbyForecasting2025/wiki/Brickman). We'll be glossing over the details here, but there's lots of detail in the wiki.
We'll read in the database that tracks 82 Brickman data files, and then immediately filter out the rows that define the "PRESENT" scenario (where present means 1982–2013) and monthly climatology models.
```{r brickman_database}
db = brickman_database() |>
filter(scenario == "PRESENT", interval == "mon") # note the double '==', it's comparative
db
```
If you are wondering about filtering a table, be sure to [check out the wiki on tabular data](https://github.com/BigelowLab/ColbyForecasting2025/wiki/tables) to get started.
You might be wondering what that `|>` is doing. It is called a pipe, and it delivers the output of one function to the next function as the first parameter (aka argument). For example, `brickman_database()` produces a table, that table is immediately passed into `filter()` to choose rows that match our criteria.
Now that we have the database listing just the records we want, we pass it to the `read_brickman()` function.
```{r read_brickman}
current = read_brickman(db)
current
```
This loads quite a complex set of arrays, but they have spatial information attached in the `dimensions` section. The `x` and `y` dimensions represent longitude and latitude respectively. The 3rd dimension, `month`, is time based.
Here we plot all 12 months of sea surface temperature, `SST`. Note the they all share the same color scale so that they are easy to compare.
```{r plot_brickman}
plot(current['SST'])
```
Just as we are able to plot linestrings/polygons along side points, we can also plot these with arrays (rasters). To do this for one month ("Apr") of one variable ("SSS") we simply need to slice that data out of the `current` variable.
```{r april_sss}
april_sss = current['SSS'] |>
slice("month", "Apr")
april_sss
```
Then it's just plot, plot, plot.
```{r april_sss_plot}
plot(april_sss, axes = TRUE, reset = FALSE)
plot(st_geometry(coast), add = TRUE, col = "orange", lwd = 2)
plot(st_geometry(buoys), add = TRUE, pch = 16, col = "purple")
```
We can plot **ALL** twelve months of a variable ("SST") with the coast and points shown. There is one slight modification to be made since a single call to `plot()` actually gets invoked 12 times for this data. So where do we add in the buoys and coast? Fortunately, we can create what is called a "hook" function - *who knows where the name hook came from?* Once the hook function is defined, it will be applied to the each of the 12 subplots.
```{r hooking}
# a little function that gets called just after each sub-plot
# it simple adds the coast and buoy
add_coast_and_buoys = function(){
plot(st_geometry(coast), col = "orange", lwd = 2, add = TRUE)
plot(st_geometry(buoys), pch = 16, col = "purple", add = TRUE)
}
# here we call the plot, and tell R where to call `add_coast_and_buoys()` after
# each subplot is made
plot(current['SST'], hook = add_coast_and_buoys)
```
# Coding Assignment
::: {.callout-note appearance="simple"}
Use the menu option `File > New File > R Script` to create a blank file. Save the file (even though it is empty) in the "assignment" directory as "assignment_script_1.R". Use this file to build a script that meets the following challenge. Note that the existing file, "assignment_script_0.R" is already there as an example.
Use the [Brickman tutorial](https://github.com/BigelowLab/ColbyForecasting2025/wiki/Brickman) to extract data from the location of Buoy M01 for RCP4.5 2055. Make a plot of `SST` (y-axis) as a function of `month` (x-axis). Here's one possible outcome.
![Buoy M01, RCP4.5 2055](images/buoy_M01_RCP45_2055.png)
:::