Skip to content

Commit

Permalink
Update READMEs to use gittargets
Browse files Browse the repository at this point in the history
  • Loading branch information
joelnitta committed Jan 22, 2022
1 parent 0f97445 commit eff747a
Show file tree
Hide file tree
Showing 2 changed files with 44 additions and 28 deletions.
45 changes: 39 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ All code is in [R](https://cran.r-project.org/). The [targets package](https://d

Data files need to be downloaded from three locations.

1. Dataset on Dryad for this project: https://doi.org/doi:10.5061/dryad.w0vt4b8s2 (LINK NOT LIVE YET). Cick on the "download dataset" icon, download the zipped dataset, then unzip it and put the contents in the `data/` folder in this repo.
1. Dataset on FigShare for this project: https://doi.org/10.6084/m9.figshare.16655263 (LINK NOT LIVE YET). Cick on the "Download all" icon, download the zipped dataset, then unzip it and put the contents in the `data/` folder in this repo.
2. Dataset on Dryad for Ebihara and Nitta 2019: https://datadryad.org/stash/dataset/doi:10.5061/dryad.4362p32. Download the zipped dataset and put in the `data/` folder directly (without unzipping).
3. Dataset on FigShare for FTOL v0.0.1 (Nitta et al, in prep): https://doi.org/doi:10.6084/m9.figshare.13256801 (LINK NOT LIVE YET). Download the zipped dataset and put in the `data/` folder directly (without unzipping).

Expand Down Expand Up @@ -50,20 +50,53 @@ When you're done, take down the container:
docker-compose down
```

## Using the `targets` cache
## Targets cache

The analysis includes some steps that take a long time to run, especially maximum-likelihood phylogenetic analysis (ca. 1 week with 10 cores in parallel). To avoid running the entire workflow from scratch, untar the `_targets.tar.gz` file in the Dryad dataset and place it in the root of this repo as `_targets`:
The [targets package](https://docs.ropensci.org/targets/index.html) manages the workflow and saves all intermediate analysis results to a folder named `_targets`; this is the targets cache.
Normally, you would have to run all of the analyses starting from the original data files to generate all of the analysis results, as described above.
This takes a long time. The longest step is the phylogenetic analysis, which takes about 1 week using 10 cores in parallel.

I have put the targets cache for this project [on github](https://github.com/joelnitta/japan_ferns_spatial_phy_cache) (LINK NOT LIVE YET) under version control using the [gittargets package](https://github.com/ropensci/gittargets).

So instead of running everything from scratch, you can checkout the exact results matching a specific code version as follows (this assumes we are in the `japan_ferns_spatial_phy` folder and requires git):

1. Clone the targets cache to a folder called `_targets`.

```
git clone https://github.com/joelnitta/japan_ferns_spatial_phy_cache _targets
```

2. Enter the `_targets` directory.

```
cd _targets
```

3. Fetch branches from the remote repo ([each branch corresponds to a selected commit in the code](https://docs.ropensci.org/gittargets/articles/git.html#snapshot-model)).

```
tar -xzf _targets.tar.gz
git fetch
```

Then, when you open the project in R [as described above](#interacting-with-the-code), you can use `targets::tar_load()` to load any target (intermediate workflow step) listed in [`_targets.R`](_targets.R). For more information on how to use the `targets` package, see https://github.com/ropensci/targets.
4. Change to the latest branch (the part of the name after `code=` matches the corresponding commit hash in `japan_ferns_spatial_phy`).

```
git switch code=0f9744508fbdc1d22319faa6118c2811c34c0c7d
```

5. Move back up to the `japan_ferns_spatial_phy` folder.

```
cd ..
```

You can also change between different snapshots of the targets cache and code using [gittargets](https://github.com/ropensci/gittargets).

When you open the project in R [as described above](#interacting-with-the-code), you can use `targets::tar_load()` to load any target (intermediate workflow step) listed in [`_targets.R`](_targets.R). For more information on how to use the `targets` package, see https://github.com/ropensci/targets.

## Licenses

- Code: [MIT license](LICENSE.md)
- Data: [CC0 1.0 license](https://creativecommons.org/publicdomain/zero/1.0/)
- [Manuscript (preprint)](https://doi.org/10.1101/2021.08.26.457744): [CC BY-NC-ND 4.0 license](https://creativecommons.org/licenses/by-nc-nd/4.0/)
- [Roboto font](https://github.com/google/roboto/): [Apache 2.0 license](http://www.apache.org/licenses/LICENSE-2.0)

27 changes: 5 additions & 22 deletions ms/data_readme.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -95,11 +95,11 @@ Licenses/restrictions placed on the data, or limitations of reuse: CC0 1.0
Universal (CC0 1.0)

Recommended citation for the data: Nitta JH, Mishler BD, Iwasaki W, Ebihara A
(2021) Data from: Spatial phylogenetics of Japanese ferns: Patterns, processes,
(2022) Data from: Spatial phylogenetics of Japanese ferns: Patterns, processes,
and implications for conservation FIXME: add DOI when available

Citation for and links to publications that cite or use the data: Nitta JH,
Mishler BD, Iwasaki W, Ebihara A (2021) Spatial phylogenetics of Japanese ferns:
Mishler BD, Iwasaki W, Ebihara A (2022) Spatial phylogenetics of Japanese ferns:
Patterns, processes, and implications for conservation FIXME: add journal when published

Code for analyzing the data is available on github:
Expand All @@ -114,7 +114,6 @@ DATA & FILE OVERVIEW
File list (filenames, directory structure (for zipped files) and brief
description of all data files):

- _targets.tar.gz: Tarball (compressed folder) including all workflow results produced by R targets package
- japan_climate.gpkg: Climate data in Japan downloaded from WorldClim database
- japan_deer_range.gpkg: Distribution maps of Japanese deer (Cervus nippon) in Japan
- japan_ferns_comm_full.csv: Community matrix (species x sites matrix) of native, non-hybrid ferns in Japan, full (unfiltered) dataset
Expand Down Expand Up @@ -191,7 +190,7 @@ Data files were generated from raw data (not included here) using scripts
available at https://github.com/joelnitta/japan_ferns_spatial_phy, in particular
https://github.com/joelnitta/japan_ferns_spatial_phy/blob/main/R/process_raw_data.R.

For full methods, see Nitta JH, Mishler BD, Iwasaki W, Ebihara A (2021) Spatial
For full methods, see Nitta JH, Mishler BD, Iwasaki W, Ebihara A (2022) Spatial
phylogenetics of Japanese ferns: Patterns, processes, and implications for
conservation FIXME: Add journal when published

Expand All @@ -201,22 +200,6 @@ DATA-SPECIFIC INFORMATION

\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-

_targets.tar.gz: Tarball (compressed folder) including all workflow results
produced by R targets package. This is provided to enable inspection of workflow
steps without running the entire workflow from the beginning. To use it, unpack
the tar achive with the command "tar -xzf _targets.tar.gz". Then, in R, the
"tar_load()" function in the R package "targets" can be used to load any
workflow step (target) defined in _targets.R
(https://github.com/joelnitta/japan_ferns_spatial_phy/blob/main/_targets.R). For
more information on the structure of the _targets folder and how to use it, see
https://github.com/ropensci/targets.

MD5 checksum: FIXME (add manually, since this can't be calculated from inside the targets workflow)

Corresponding commit in repo (https://github.com/joelnitta/japan_ferns_spatial_phy): FIXME add manually

\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-

```{r deer-range}
deer_range <- sf::st_read(here::here(deer_range_file))
# Check that no data are missing
Expand Down Expand Up @@ -668,11 +651,11 @@ japan_ferns_biodiv_figshare <- read_csv(here::here(japan_ferns_biodiv_figshare_f

`r fs::path_file(japan_ferns_biodiv_figshare_file)` (contained in "results.zip"):
Biodiversity statistics of native, non-hybrid ferns and environmental variables
in Japan. Biodiversity metrics calculated as described in Nitta et al. 2021.
in Japan. Biodiversity metrics calculated as described in Nitta et al. 2022.
Climatic (temperature and preciptation) variables
calculated as described for japan_climate.gpkg. Includes one row with missing
environmental data and one outlier for % apomixis that were removed prior to
spatial modeling analysis in Nitta et al. 2021.
spatial modeling analysis in Nitta et al. 2022.

Number of variables: `r ncol(japan_ferns_biodiv_figshare)`

Expand Down

0 comments on commit eff747a

Please sign in to comment.