-
Notifications
You must be signed in to change notification settings - Fork 0
/
REST-API-feb-28.qmd
214 lines (142 loc) · 9.71 KB
/
REST-API-feb-28.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
---
title: "Quick Start: REST API in R"
subtitle: Getting data into R with from a REST API
author: Eli Holmes, NWFSC/NOAA
format:
revealjs:
theme: [serif, custom.scss]
footer: "[Home](index.html)"
self-contained: true
---
## What is a REST API?
TLDC; It is a way we can get some data off a database using an URL--if the database folk set up a REST API for us.
## Example
Repos on nmfs-opensci using the GitHub REST API
```{r echo=TRUE}
org <- "nmfs-opensci"
url <- paste0("https://api.github.com/search/repositories?q=org:", org)
res <- httr::GET(url)
dat <- jsonlite::fromJSON(rawToChar(res$content))
head(dat$items[,c("name", "has_issues")])
```
## StreamNet CAX REST API
This is the database I want to get data from
<https://www.streamnet.org/resources/exchange-tools/rest-api-documentation/>
Ok, not so helpful. In particular it doesn't show me the full url and doesn't tell me that I need an API key or how to pass it in!
## What's an API key??
- **What's an API key??** A kind of token. Gives you access and gives you particular permissions, like read-only, read-write, etc.
- **Do I need one?** No! Some REST APIs don't need a key.
- **How did you know you needed one for CAX?** I didn't. My queries didn't work and so my collaborator went and talked to the CAX folks who set up the REST API.
## Stop: API Keys
rCAX uses public read-only API key for accessing this public database. It is part of the package. But for the raw code shown in this talk I X'd out the key. Even though it is a public and not sensitive.
See the comments at the end for how to write code without including the API key in your code. You'll see that code if you look at the raw qmd file for this talk.
## Get a URL query that works.
Look at the documentation, ask someone, ask the database developers. Getting that first URL is the hard part. After that you are golden.
res <- httr::GET("https://api.streamnet.org/api/v1/ca.json?table_id=4EF09E86-2AA8-4C98-A983-A272C2C2C7E3&XApiKey=XXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX&page=1&per_page=10")
data <- jsonlite::fromJSON(rawToChar((res$content)))
Let's parse this URL
[https://api.streamnet.org/api/v1]{style="color: blue"}/ca.json[?]{style="color: green"}[table_id=]{style="color: red"}4EF09E86-2AA8-4C98-A983-A272C2C2C7E3[&]{style="color: green"}[XApiKey=]{style="color: red"}XXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX[&]{style="color: green"}[page=]{style="color: red"}1[&]{style="color: green"}[per_page=]{style="color: red"}10
## Tables
Looking at the documentation, you might guess that this gives you "tables". Let's try it:
res <- httr::GET("https://api.streamnet.org/api/v1/ca/tables.json?XApiKey=XXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX")
data <- jsonlite::fromJSON(rawToChar((res$content)))
Jack Pot!! The table ids that I need for my queries!
## Keeping your API key private 1
It is easy to keep this out of your code. Here's how. You are going to save the key with a name in your `.Renviron` file in your user directory. Note you'll need to do this on all your computers if you use multiple because the key stays on the local machine.
Note the key is going to be in plain text. This is approach is not for sensitive keys.
## Keeping your API key private 2
This is going to show you where .Renviron is and open it for you
install.packages("usethis")
usethis::edit_r_environ()
Paste this into the `.Renviron` file:
BEST_APIKEY <- “the api key. It is long. Ask database dev for one.”
## Keeping your API key private 3
RESTART R! You have to do this whenever you edit `.Renviron`
Now whenever you need the API key in your code use
Sys.getenv("BEST_APIKEY", "")
For example, here is how I could set up a url to get tables from the StreamNet REST API. Now I can share this code without sharing or exposing the API_KEY
url <- paste0(“https://api.streamnet.org/api/v1/ca/tables.json?XApiKey=”, Sys.getenv("BEST_APIKEY", ""))
res <- httr::GET(url)
## Keeping your API key private 4
The .Renviron file the API key is unencrypted. There are cases where you want a key to be encrypted (AWS keys, keys to get into your google drive are good examples) . Search online for how to do that. It's the same idea though, just you are using an encrypted key rather than encrypted. In both cases, the key (or secret) stays on your local machine and is not shared in your code.
## Creating your an R client
You are ready to start creating your R package aka R REST API client! Clone someone else's (recent) R package for a client and just edit that.
[https://github.com/nwfsc-math-bio/rCAX](https://nwfsc-math-bio.github.io/rCAX/)
![](https://nwfsc-math-bio.github.io/rCAX/reference/figures/logo2.png){fig-align="center" width="50%"}
## rCAX key functions
- [`rcax_GET()`](https://github.com/nwfsc-math-bio/rCAX/blob/main/R/rcax_GET.R) - general query
- [`rcax_table_query()`](https://github.com/nwfsc-math-bio/rCAX/blob/main/R/rcax_table_query.R) - query for a table given the tablename, makes the query list and calls `rcax_GET()`
- [`rcax_hli()`](https://github.com/nwfsc-math-bio/rCAX/blob/main/R/rcax_hli.R) - gets the tablename for a hli (high lev index), calls `rcax_table_query()`
## rcax_GET()
This function is what prepares the url and runs the `GET` call to the REST API. I copied this from the [ropensci/rredlist](https://github.com/ropensci/rredlist) R package (R client for IUCN REST API
Goal 1. Get this to "work"
```
res <- rcax_GET("ca/tables")
```
Goal 2. Get this to "work"
```
res <- rcax_GET("ca", query=list(table_id="xxxx", XAPIKey="xxx"))
```
It should return some JSON.
## rcax_GET()
The `rcax_GET.R` file is a series of utility functions.
- rcax_GET() - uses crul package
- rcax_base() - base url to REST API
- rcax_ua() - creates a user agent string to tell REST API who is pinging it
- check_key() - gets the API key
- rcax_parse() - parses the JSON output
## rcax_GET()
[Read documentation](https://nwfsc-math-bio.github.io/rCAX/reference/rcax_GET.html)
* url = everything up to the ?
* query = the query params as a list
* useragent = optional but it is a string
```
rcax_GET <- function(path, key = NULL, query=NULL, ...){
cli <- crul::HttpClient$new(
url = file.path(rcax_base(), path),
opts = list(useragent = rcax_ua())
)
temp <- cli$get(query = c(query, list(XApiKey = check_key(key))), ...)
temp$raise_for_status()
x <- temp$parse("UTF-8")
err_catcher(x)
return(x)
}
```
## rcax_GET()
res <- httr::GET("https://api.streamnet.org/api/v1/ca.json?table_id=4EF09E86-2AA8-4C98-A983-A272C2C2C7E3&XApiKey=XXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX&page=1&per_page=10")
```
rcax_base() <- function() "https://api.streamnet.org/api/v1"
path <- "ca.json"
query <-
list(table_id="4EF09E86-2AA8-4C98-A983-A272C2C2C7E3",
XAPIKey="XXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX",
page=1, per_page=10)
rcax_ua() <- function() "rCAX R client"
```
rcax_GET <- function(path, key = NULL, query=NULL, ...){
cli <- crul::HttpClient$new(
url = file.path(rcax_base(), path),
opts = list(useragent = rcax_ua())
)
temp <- cli$get(query = query, ...)
temp$raise_for_status()
x <- temp$parse("UTF-8")
err_catcher(x)
return(x)
}
## filter function woes 1
I wanted to be able to only retrieve data for a particular population, so a value in the pop_id column. There was no info, and none of the standard ways I found via online searching worked.
## filter function woes 2
Finally in my online searches, I stumbled on the "filter" query parameter approach. I asked chatGPT, "Show me how to create a filter in a REST API query." That got me closer, but there are still multiple way to do it. Then, I was staring at some CAX html page source code and found some Javascript (which I don't know) with a function `filter`. It appeared to use "json" (which I also don't know). Then I asked chatGPT to show me a REST API filter with JSON. Jack pot!
## filter function
- [`rcax_filter()`](https://github.com/nwfsc-math-bio/rCAX/blob/main/R/rcax_filter.R) - converts a list into json, which is what this REST API wants
## FAQs 1
- **Is this easier than using Oracle?** Totally. Dead easy. You only need the URL and API Key. The database devs can set up the key however they want. The keys you'll see today are the public and read-only key for the public database (otherwise I would not show them!).
- **How do I find out if my database has a REST API?** Maybe it is on their documentation but you might just have to ask. Apparently you can set up REST API for an Oracle database? I know nothing about this, but if you struggle w getting data from an Oracle database, you might ask if they have a REST API set up.
## FAQs 2
- **How do I find out the syntax REST API URLs?** If you are lucky, the REST API for your database has documentation. Otherwise you will have to resort to trial and error or, better, reach out to the database developers. However, all you really need is 1 or 2 examples of the URL for the REST API.
- **What there isn't standard syntax??** Yes and No. The first part url+query params is standard, but you need to find out the query param names and the values that those can take. Filter or subsetting the data is also very idiosyncratic. Documentation is often poor, in which case you need to talk to the database devs.
## FAQs 3
- **Do I need an API key?** No! Some REST APIs don't need a key, but I think in that case, they often limit how much you can ping the database. I show an example below of using the GitHub REST API which doesn't need a key.
- **What's an API key??** A kind of token. Gives you access and gives you particular permissions, like read-only, read-write, certain limit of data, etc. Where do I get it? Depends. Documentaiton should say or contact the database owners.