Skip to content

Commit

Permalink
Updated Global vars, updated vignette worfklow
Browse files Browse the repository at this point in the history
and tests
  • Loading branch information
fontikar committed Sep 13, 2024
1 parent 6eb47f4 commit 5cd0f6b
Show file tree
Hide file tree
Showing 9 changed files with 184 additions and 39 deletions.
2 changes: 1 addition & 1 deletion .Rbuildignore
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
^\.github$
^codecov\.yml$
^inst/data$
^ignore/$
^ignore$



Expand Down
10 changes: 6 additions & 4 deletions DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -32,14 +32,16 @@ Imports:
shiny,
shinybusy,
stringr,
shinythemes
shinythemes,
tidyr
Remotes:
traitecoevo/APCalign
Suggests:
job,
bsplus,
testthat (>= 3.0.0),
here,
job,
knitr,
rmarkdown
rmarkdown,
testthat (>= 3.0.0)
Config/testthat/edition: 3
VignetteBuilder: knitr
6 changes: 2 additions & 4 deletions R/gbif_download.R
Original file line number Diff line number Diff line change
Expand Up @@ -11,10 +11,7 @@
#'
#' @return None. The function saves the processed data to the specified output directory.
#' @export
#'
#' @examples
#' download_gbif_obs("Puma concolor")
#' download_gbif_obs("Puma concolor", min_year = 2000, country_code = "US", save_raw_data = TRUE)

download_gbif_obs <- function(taxon,
min_year = 1923,
max_year = as.numeric(format(Sys.Date(), "%Y")),
Expand All @@ -37,6 +34,7 @@ download_gbif_obs <- function(taxon,
#'
#' @param taxon character, genus/family/kingdom
#' @param min_year numeric, year cut off for query, only records where year >= min_year will be included
#' @param max_year numeric, year cut off for query, only records where year <= max_year will be included
#' @param country_code character, code for country
#' @export

Expand Down
11 changes: 9 additions & 2 deletions R/infinitylists-package.R
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,14 @@ utils::globalVariables(
"write.csv",
"Link",
"Repository",
"Establishment means",
"repository"
"Establishment Means",
"repository",
"eventDate_as_date",
"eventDate_ymd",
"establishmentMeans",
"hasGeospatialIssue",
"link",
"country",
"count"
)
)
4 changes: 0 additions & 4 deletions man/download_gbif_obs.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 2 additions & 0 deletions man/query_gbif_global.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion tests/testthat/_snaps/galah_download.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,5 +14,5 @@
6 Hemico~ Hemi~ Cordu~ 1924-10-10 00:00:00 -28.7 152. Collection QM
7 Synthe~ Synt~ Synth~ 1924-01-01 00:00:00 -41.9 145. Collection QM
# i 4 more variables: `Recorded by` <chr>, `Record Id` <chr>, Link <chr>,
# `Establishment means` <chr>
# `Establishment Means` <chr>

56 changes: 33 additions & 23 deletions vignettes/diy.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -7,30 +7,20 @@ vignette: >
%\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
eval = FALSE
)

library(infinitylists)
library(galah)
library(arrow)
library(tidyverse)
```

One major benefit of infinitylists using a Living Atlas node is that is gives users the ability to create their own version of infinitylists for whichever Living Atlas you would like to use. Unfortunately, there are some slight inconsistencies in data coverage and naming between Living Atlas data providers. This makes creating your own infinitylists not an entirely straightforward process but I hope this article will be able to give you some guidance.

Here, I will walk through the process on how to adapt the source code of infinitylists so you can create your own version of infinitylists for any country and taxa of your choice.

If you have any questions about this process, please do not hesitate and reach out but submitting an issue at the [infinitylists repository](XX).
If you have any questions about this process, please do not hesitate and reach out but submitting an issue at the [infinitylists repository](https://github.com/traitecoevo/infinitylists).

### Load dependencies

We are going to need a few packages to create your own infinitylists. Go ahead and install these if you don't have these in your version of R. Otherwise, load them and we can get started

```{r setup}

``` r
# install.packages("devtools")
devtools::install_github("traitecoevo/infinitylists")
library(infinitylists)
Expand All @@ -47,7 +37,8 @@ We will be using [{galah}](https://galah.ala.org.au/R/) to download occurrence r

Here I've saved the credentials in my R environment so its not shared publicly. I can call on these environment variables using `Sys.getenv()`. You can also do so with `usethis::edit_r_environ`.

```{r}

``` r
# Set atlas
galah_config(
atlas = "Global",
Expand All @@ -64,7 +55,8 @@ Once we have all that set up, we can request data from GBIF Global. Here I am do

Note that depending on how many records are requested, the download will take some time.

```{r}

``` r
download_gbif_obs("Podarcis",
min_year = 2000,
max_year = 2024,
Expand All @@ -75,7 +67,8 @@ download_gbif_obs("Podarcis",

You can check roughly how big your download is but using the `query()` function with `galah::atlas_counts()`. Note this will not be the find number of records that goes into infinitylist as we do further exclusions and data cleaning behind the scenes.

```{r}

``` r
query_gbif_global("Podarcis",
min_year = 2000,
max_year = 2024,
Expand All @@ -89,7 +82,8 @@ You can investigate the full download by specifying `save_raw_data = TRUE` in `d

Once the download is complete, you are all set! Launch infinitylist and you will find your download under the dropdown menu "taxa"

```{r}

``` r
infinitylistApp()
```

Expand All @@ -104,7 +98,8 @@ If you specified `save_raw_data = TRUE` in `download_gbif_obs()`, this code will
- `"GBIF-preprocessed-"` is the raw download **before** our data cleaning.
- `"Living-Atlas-"` is the final **cleaned** download of the data you view in the app.

```{r}

``` r
# Locate file path of downloads
system.file(package = "infinitylists") |>
file.path("data") |>
Expand All @@ -113,16 +108,31 @@ system.file(package = "infinitylists") |>

Copy the file path and pasted it in the `read_parquet()` function to open the download in R.

```{r}

``` r
gbif_podarcis <- arrow::read_parquet("infinitylists/inst/data/Living-Atlas-Podarcis-2024-09-13.parquet")
```

```{r include=FALSE, eval=TRUE}
gbif_podarcis <- arrow::read_parquet(here::here("inst/data/Living-Atlas-Podarcis-2024-09-13.parquet"))
```

```{r, eval=TRUE}


``` r
gbif_podarcis |> print(n = 10)
#> # A tibble: 6,155 × 11
#> Species Genus Family `Collection Date` Lat Long `Voucher Type` Repository
#> * <chr> <chr> <chr> <dttm> <dbl> <dbl> <chr> <chr>
#> 1 Podarcis tiliguer… Poda… Lacer… 2020-10-18 09:20:00 41.6 8.81 Photograph https://w…
#> 2 Podarcis tiliguer… Poda… Lacer… 2024-07-01 14:41:00 42.3 8.87 Photograph https://w…
#> 3 Podarcis tiliguer… Poda… Lacer… 2022-05-30 00:00:00 41.8 9.23 Photograph https://w…
#> 4 Podarcis siculus Poda… Lacer… 2024-05-17 10:37:00 42.8 9.48 Photograph https://w…
#> 5 Podarcis muralis Poda… Lacer… 2023-07-28 17:05:00 43.5 -1.52 Photograph https://w…
#> 6 Podarcis liolepis Poda… Lacer… 2020-04-08 00:00:00 44.0 3.52 Photograph https://w…
#> 7 Podarcis liolepis Poda… Lacer… 2014-05-09 08:44:00 43.6 3.02 Photograph https://w…
#> 8 Podarcis muralis Poda… Lacer… 2023-05-14 19:55:00 47.6 1.34 Photograph https://w…
#> 9 Podarcis muralis Poda… Lacer… 2023-10-14 12:39:00 44.6 6.17 Photograph https://w…
#> 10 Podarcis muralis Poda… Lacer… 2020-06-18 04:07:00 43.9 -1.38 Photograph https://w…
#> # ℹ 6,145 more rows
#> # ℹ 3 more variables: `Recorded by` <chr>, `Establishment Means` <lgl>, Link <chr>
```


130 changes: 130 additions & 0 deletions vignettes/diy.Rmd.orig
Original file line number Diff line number Diff line change
@@ -0,0 +1,130 @@
---
title: "Create your own infinitylists"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{DIY infinitylists}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
eval = FALSE
)

library(infinitylists)
library(galah)
library(arrow)
library(dplyr)

# knitr::knit("vignettes/diy.Rmd.orig", output = "vignettes/diy.Rmd")
```

One major benefit of infinitylists using a Living Atlas node is that is gives users the ability to create their own version of infinitylists for whichever Living Atlas you would like to use. Unfortunately, there are some slight inconsistencies in data coverage and naming between Living Atlas data providers. This makes creating your own infinitylists not an entirely straightforward process but I hope this article will be able to give you some guidance.

Here, I will walk through the process on how to adapt the source code of infinitylists so you can create your own version of infinitylists for any country and taxa of your choice.

If you have any questions about this process, please do not hesitate and reach out but submitting an issue at the [infinitylists repository](XX).

### Load dependencies

We are going to need a few packages to create your own infinitylists. Go ahead and install these if you don't have these in your version of R. Otherwise, load them and we can get started

```{r setup}
# install.packages("devtools")
devtools::install_github("traitecoevo/infinitylists")
library(infinitylists)
library(galah)
library(arrow)
library(tidyverse)
```

You will need to register for a [GBIF account](https://www.gbif.org/). Click on the "Login" button on the top left corner and click on the "Register" tab. Note down your login credentials for safe-keeping once you have verified your account and created a password.

### Configure galah

We will be using [{galah}](https://galah.ala.org.au/R/) to download occurrence records used our infinitylist. To do so, we need to configure the settings so the package knows to point to the Global GBIF API.

Here I've saved the credentials in my R environment so its not shared publicly. I can call on these environment variables using `Sys.getenv()`. You can also do so with `usethis::edit_r_environ`.

```{r}
# Set atlas
galah_config(
atlas = "Global",
username = Sys.getenv("GBIF_USERNAME"),
password = Sys.getenv("GBIF_PWD"),
email = Sys.getenv("GBIF_EMAIL")
)

```

### Submit data request

Once we have all that set up, we can request data from GBIF Global. Here I am downloading records for the skink genus 'Podarcis', from years 2000 to 2004. Under `country_code`, I've specified `"FR"` for records found in France. Here is a [list of codes](https://en.wikipedia.org/wiki/ISO_3166-2) for each country. The `download_gbif_obs` function will download the records and say it internally inside the infinitylist R package so you can use it immediately.

Note that depending on how many records are requested, the download will take some time.

```{r}
download_gbif_obs("Podarcis",
min_year = 2000,
max_year = 2024,
country_code = "FR")
```

### Pre-download check

You can check roughly how big your download is but using the `query()` function with `galah::atlas_counts()`. Note this will not be the find number of records that goes into infinitylist as we do further exclusions and data cleaning behind the scenes.

```{r}
query_gbif_global("Podarcis",
min_year = 2000,
max_year = 2024,
country_code = "FR") |>
galah::atlas_counts()
```

You can investigate the full download by specifying `save_raw_data = TRUE` in `download_gbif_obs()`

### Launch infinitylist and explore!

Once the download is complete, you are all set! Launch infinitylist and you will find your download under the dropdown menu "taxa"

```{r}
infinitylistApp()
```

### Open downloaded data

The following code identifies the file path of where your GBIF Global records are downloaded if you want to open data in R or export it for other uses. This is usually handy if you want to orientate the map to where your download is from using `"Choose a lat/long"`.

In the next code chunk, replace `"Podarcis"` in the `pattern` argument with the name of the taxa you have downloaded data for in the previous step. This code will provide the full file paths of objects that match the `pattern` argument.

If you specified `save_raw_data = TRUE` in `download_gbif_obs()`, this code will you two file paths. The file with the prefix:

- `"GBIF-preprocessed-"` is the raw download **before** our data cleaning.
- `"Living-Atlas-"` is the final **cleaned** download of the data you view in the app.

```{r}
# Locate file path of downloads
system.file(package = "infinitylists") |>
file.path("data") |>
list.files(pattern = "Podarcis", full.names = TRUE) # Match for Podarcis
```

Copy the file path and pasted it in the `read_parquet()` function to open the download in R.

```{r}
gbif_podarcis <- arrow::read_parquet("infinitylists/inst/data/Living-Atlas-Podarcis-2024-09-13.parquet")
```

```{r include=FALSE, eval=TRUE}
gbif_podarcis <- arrow::read_parquet(here::here("inst/data/Living-Atlas-Podarcis-2024-09-13.parquet"))
```

```{r, eval=TRUE}
gbif_podarcis |> print(n = 10)
```


0 comments on commit 5cd0f6b

Please sign in to comment.