Skip to content

Commit

Permalink
spelling fixes and removing special Mac install instructions (#229)
Browse files Browse the repository at this point in the history
* spelling fixes and removing special Mac install instructions

* adding mac back into CI
  • Loading branch information
wcornwell authored May 29, 2024
1 parent 3599080 commit 6acbeac
Show file tree
Hide file tree
Showing 10 changed files with 64 additions and 97 deletions.
1 change: 1 addition & 0 deletions .github/workflows/R-CMD-check.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ jobs:
fail-fast: false
matrix:
config:
- {os: macos-latest, r: 'release'}
- {os: windows-latest, r: 'release'}
- {os: ubuntu-latest, r: 'release'}
- {os: ubuntu-latest, r: 'oldrel-1'}
Expand Down
2 changes: 1 addition & 1 deletion NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ streamlined the package.
* Write a replacement function for `stringr::word` that is much faster.
* Additional speed up and accuracy of fuzzy_match function by
- Restricting reference list to names with the same first letter as input string.
- Switch from using `utils::adist` to `stringdist:stringdist(method = "dl")`
- Switch from using `utils::adist` to `stringdist::stringdist(method = "dl")`
* Rework `standardise_names` to remove punctuation from the start of the string
* Rework `strip_names_extra` (previously `strip_names_2`) to just perform
additional functions to `strip_names`, rather than repeating those performed by `strip_names`.
Expand Down
2 changes: 1 addition & 1 deletion R/align_taxa.R
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@
#' synonyms, orthographic variants) over fuzzy matches.
#' - It prioritises matches to taxa in the APC over names in the APNI.
#' - It identifies string patterns in input names that suggest a name can only
#' be aligned to a genus (hybrids that are not in the APC/ANI; graded species;
#' be aligned to a genus (hybrids that are not in the APC/APNI; graded species;
#' taxa not identified to species), and indicates these names only have a
#' genus-rank match.
#'
Expand Down
2 changes: 1 addition & 1 deletion R/update_taxonomy.R
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@
#' Notes:
#' - As the input for this function is a table with 5 columns (output by
#' align_taxa), this function will only be used when you explicitly want to
#' separate the aligment and updating components of APCalign. This function is
#' separate the alignment and updating components of APCalign. This function is
#' the second half of create_taxonomic_update_lookup.
#'
#' @family taxonomic alignment functions
Expand Down
32 changes: 10 additions & 22 deletions README.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -25,35 +25,23 @@ library(APCalign)

# APCalign <img src="man/figures/APCalign_hex_2.svg" align="right" width="120"/>

'APCalign' uses the [Australian Plant Census (APC)](https://biodiversity.org.au/nsl/services/search/taxonomy) and [Australian Plant Name Index](https://biodiversity.org.au/nsl/services/search/names) to align and update Australian plant taxon name strings. 'APCalign' also supplies information about
the established status (native/introduced) of plant taxa across different states/territories.
`APCalign` uses the [Australian Plant Census (APC)](https://biodiversity.org.au/nsl/services/search/taxonomy) and [Australian Plant Name Index](https://biodiversity.org.au/nsl/services/search/names) to align and update Australian plant taxon name strings. 'APCalign' also supplies information about
the established status (native/introduced) of plant taxa across different states/territories. It's useful for updating species list and intersecting them with the APC consensus understanding of established status (native/introduced).

## Installation

For Windows and Linux:

```{r install, eval= FALSE}
# install.packages("remotes")
# remotes::install_github("traitecoevo/APCalign", dependencies = TRUE, upgrade = "ask")
```

for MacOS there is currently an extra line needed to install a working binary of the `arrow` dependency from r-universe instead of CRAN:

```{r install_mac, eval= FALSE}
# install.packages("arrow", repos = c('https://apache.r-universe.dev', 'https://cloud.r-project.org'))
# remotes::install_github("traitecoevo/APCalign", dependencies = TRUE, upgrade = "ask")
install.packages("remotes")
remotes::install_github("traitecoevo/APCalign")
```


## A quick demo

Generating a look-up table can be done with just one function:

```{r}
```{r,message=FALSE}
library(APCalign)
Expand All @@ -68,7 +56,7 @@ create_taxonomic_update_lookup(

if you're going to use APCalign more than once, it will save you time to load the taxonomic resources into memory first:

```{r}
```{r,message=FALSE}
tax_resources <- load_taxonomic_resources()
Expand All @@ -83,7 +71,7 @@ create_taxonomic_update_lookup(
)
```

Checking for Australian natives:
Checking for a list of species to see if they are classified as Australian natives:

```{r, message=FALSE}
Expand All @@ -96,12 +84,12 @@ We also developed a shiny application for non-R users to update and align their

## Learn more

Highly recommend looking at our [Getting Started](https://traitecoevo.github.io/APCalign/articles/APCalign.html) vignette to learn about how to use 'APCalign'. You can also learn more about our [taxa matching algorithm](https://traitecoevo.github.io/APCalign/articles/updating-taxon-names.html).
Highly recommend looking at our [Getting Started](https://traitecoevo.github.io/APCalign/articles/APCalign.html) vignette to learn about how to use `APCalign`. You can also learn more about our [taxa matching algorithm](https://traitecoevo.github.io/APCalign/articles/updating-taxon-names.html).


## Found a bug?

Did you come across an unexpected taxon name change? Elusive error you can't debug - [submit an issue](https://github.com/traitecoevo/APCalign/issues) and we will try our best to help
Did you come across an unexpected taxon name change? Elusive error you can't debug - [submit an issue](https://github.com/traitecoevo/APCalign/issues) and we will try our best to help.

## Comments and contributions

Expand Down
101 changes: 42 additions & 59 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,31 +10,23 @@ coverage](https://codecov.io/gh/traitecoevo/APCalign/branch/master/graph/badge.s

# APCalign <img src="man/figures/APCalign_hex_2.svg" align="right" width="120"/>

APCalign uses the [Australian Plant Census
`APCalign` uses the [Australian Plant Census
(APC)](https://biodiversity.org.au/nsl/services/search/taxonomy) and
[Australian Plant Name
Index](https://biodiversity.org.au/nsl/services/search/names) to align
and update Australian plant taxon name strings. ‘APCalign’ also supplies
information about the established status (native/introduced) of plant
taxa across different states/territories.
taxa across different states/territories. It’s useful for updating
species list and intersecting them with the APC consensus understanding
of established status (native/introduced).

## Installation

For Windows and Linux:

``` r

# install.packages("remotes")
# remotes::install_github("traitecoevo/APCalign", dependencies = TRUE, upgrade = "ask")
```

for MacOS there is currently an extra line needed to install a working
binary of the `arrow` dependency from r-universe instead of CRAN:

``` r

# install.packages("arrow", repos = c('https://apache.r-universe.dev', 'https://cloud.r-project.org'))
# remotes::install_github("traitecoevo/APCalign", dependencies = TRUE, upgrade = "ask")
install.packages("remotes")
remotes::install_github("traitecoevo/APCalign")

```

## A quick demo
Expand All @@ -52,58 +44,49 @@ create_taxonomic_update_lookup(
"Commersonia rosea"
)
)
#> Checking alignments of 3 taxa
#> ================================================================================================================================================================
#> # A tibble: 3 × 12
#> original_name aligned_name accepted_name suggested_name genus taxon_rank
#> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 Banksia integrifol… Banksia int… Banksia inte… Banksia integ… Bank… species
#> 2 Acacia longifolia Acacia long… Acacia longi… Acacia longif… Acac… species
#> 3 Commersonia rosea Commersonia… Androcalva r… Androcalva ro… Andr… species
#> # ℹ 6 more variables: taxonomic_dataset <chr>, taxonomic_status <chr>,
#> # scientific_name <chr>, aligned_reason <chr>, update_reason <chr>,
#> # number_of_collapsed_taxa <dbl>
```

#> Loading resources into memory...
#> ================================================================================================================================================================
#> ...done
#> -> of these 2 names have a perfect match to a scientific name in the APC. Alignments being sought for remaining names.
#> # A tibble: 3 × 12
#> original_name aligned_name accepted_name suggested_name genus taxon_rank
#> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 Banksia integrifol… Banksia int… Banksia inte… Banksia integ… Bank… species
#> 2 Acacia longifolia Acacia long… Acacia longi… Acacia longif… Acac… species
#> 3 Commersonia rosea Commersonia… Androcalva r… Androcalva ro… Andr… species
#> # ℹ 6 more variables: taxonomic_dataset <chr>, taxonomic_status <chr>,
#> # scientific_name <chr>, aligned_reason <chr>, update_reason <chr>,
#> # number_of_collapsed_taxa <dbl>

if you’re going to use APCalign more than once, it will save you time to
load the taxonomic resources into memory first:

``` r

tax_resources <- load_taxonomic_resources()
#> ================================================================================================================================================================

create_taxonomic_update_lookup(
taxa = c(
"Banksia integrifolia",
"Acacia longifolia",
"Commersonia rosea",
"not a species"
),
resources = tax_resources
)
#> # A tibble: 4 × 12
#> original_name aligned_name accepted_name suggested_name genus taxon_rank
#> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 Banksia integrifol… Banksia int… Banksia inte… Banksia integ… Bank… species
#> 2 Acacia longifolia Acacia long… Acacia longi… Acacia longif… Acac… species
#> 3 Commersonia rosea Commersonia… Androcalva r… Androcalva ro… Andr… species
#> 4 not a species <NA> <NA> <NA> <NA> <NA>
#> # ℹ 6 more variables: taxonomic_dataset <chr>, taxonomic_status <chr>,
#> # scientific_name <chr>, aligned_reason <chr>, update_reason <chr>,
#> # number_of_collapsed_taxa <dbl>
```

#> Loading resources into memory...
#> ================================================================================================================================================================
#> ...done

create_taxonomic_update_lookup(
taxa = c(
"Banksia integrifolia",
"Acacia longifolia",
"Commersonia rosea",
"not a species"
),
resources = tax_resources
)
#> Checking alignments of 4 taxa
#> -> of these 2 names have a perfect match to a scientific name in the APC. Alignments being sought for remaining names.
#> # A tibble: 4 × 12
#> original_name aligned_name accepted_name suggested_name genus taxon_rank
#> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 Banksia integrifol… Banksia int… Banksia inte… Banksia integ… Bank… species
#> 2 Acacia longifolia Acacia long… Acacia longi… Acacia longif… Acac… species
#> 3 Commersonia rosea Commersonia… Androcalva r… Androcalva ro… Andr… species
#> 4 not a species <NA> <NA> <NA> <NA> <NA>
#> # ℹ 6 more variables: taxonomic_dataset <chr>, taxonomic_status <chr>,
#> # scientific_name <chr>, aligned_reason <chr>, update_reason <chr>,
#> # number_of_collapsed_taxa <dbl>

Checking for Australian natives:
Checking for a list of species to see if they are classified as
Australian natives:

``` r

Expand All @@ -125,7 +108,7 @@ align their taxonomic names. You can find the application here:

Highly recommend looking at our [Getting
Started](https://traitecoevo.github.io/APCalign/articles/APCalign.html)
vignette to learn about how to use APCalign. You can also learn more
vignette to learn about how to use `APCalign`. You can also learn more
about our [taxa matching
algorithm](https://traitecoevo.github.io/APCalign/articles/updating-taxon-names.html).

Expand All @@ -134,7 +117,7 @@ algorithm](https://traitecoevo.github.io/APCalign/articles/updating-taxon-names.
Did you come across an unexpected taxon name change? Elusive error you
can’t debug - [submit an
issue](https://github.com/traitecoevo/APCalign/issues) and we will try
our best to help
our best to help.

## Comments and contributions

Expand Down
2 changes: 1 addition & 1 deletion man/align_taxa.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion man/update_taxonomy.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

15 changes: 5 additions & 10 deletions vignettes/APCalign.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -9,26 +9,21 @@ vignette: >



When working with biodiversity data, it is important to verify taxonomic names with an authoritative list and correct any out-of-date names. The 'APCalign' package simplifies this process by:
When working with biodiversity data, it is important to verify taxonomic names with an authoritative list and correct any out-of-date names. The `APCalign` package simplifies this process by:

- Accessing up-to-date taxonomic information from the [Australian Plant Census](https://biodiversity.org.au/nsl/services/search/taxonomy) and the [Australia Plant Name Index](https://biodiversity.org.au/nsl/services/search/names).
- Aligning authoritative names to your taxonomic names using our [fuzzy matching algorithm](https://traitecoevo.github.io/APCalign/articles/updating-taxon-names.html)
- Updating your taxonomic names in a transparent, reproducible manner

## Installation

'APCalign' is currently not on CRAN. You can install its current developmental version using



```r
# install.packages("remotes")
install.packages("remotes")
remotes::install_github("traitecoevo/APCalign")

library(APCalign)
```

To demonstrate how to use 'APCalign', we will use an example dataset `gbif_lite` which is documented in `?gbif_lite`
To demonstrate how to use `APCalign`, we will use an example dataset `gbif_lite` which is documented in `?gbif_lite`



Expand All @@ -52,14 +47,14 @@ gbif_lite |> print(n = 6)

## Retrieve taxonomic resources

The first step is to retrieve the entire APC and APNI name databases and store them locally as taxonomic resources. We achieve this using `load_taxonomic_resources()`.
The first step is to retrieve the entire APC and APNI name databases and store them locally as taxonomic resources. We achieve this using `load_taxonomic_resources()`. The resources are compressed as parquet files to speed download and local loading.

There are two versions of the databases that you can retrieve with the `stable_or_current_data` argument. Calling:

- `stable` will retrieve the most recent, archived version of the databases from our [GitHub releases](https://github.com/traitecoevo/APCalign/releases). This is set as the default option.
- `current` will retrieve the up-to-date databases directly from the APC and APNI website.

Note that the databases are quite large so the initial retrieval of `stable` versions will take a few minutes. Once the taxonomic resources have been stored locally, subsequent retrievals will take less time. Retrieving `current` resources will always take longer since it is accessing the latest information from the website. Check out our [Resource Caching](https://traitecoevo.github.io/APCalign/articles/caching.html) article to learn more about how the APC and APNIC databases are accessed, stored and retrieved.
Note that the databases are reasonably large so the initial retrieval of the core data will take a few minutes. Once the taxonomic resources have been stored locally, subsequent retrievals will take less time. Retrieving `current` resources will always take longer since it is accessing the latest information from the website in an uncompressed format.


```r
Expand Down
2 changes: 1 addition & 1 deletion vignettes/articles/reproducibility.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,7 @@ default_version()

Then copying and pasting the output into `load_taxonomic_resources()` directly. This way makes the version of taxonomic resources more explicit in your code.

To ensure the specific version of taxonomic resources is availabe for subsequent functions make sure to assign them to an object:
To ensure the specific version of taxonomic resources is available for subsequent functions make sure to assign them to an object:

```{r}
resources_0.0.4.9000 <- load_taxonomic_resources(
Expand Down

0 comments on commit 6acbeac

Please sign in to comment.