-
Notifications
You must be signed in to change notification settings - Fork 2
/
README.Rmd
170 lines (123 loc) · 8.83 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r echo = FALSE}
knitr::opts_chunk$set(collapse = TRUE, fig.path = "man/figures/README-")
```
<!-- badges: start -->
[![R-CMD-check](https://github.com/AustralianAntarcticDivision/blueant/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/AustralianAntarcticDivision/blueant/actions/workflows/R-CMD-check.yaml)
[![Codecov test coverage](https://codecov.io/gh/AustralianAntarcticDivision/blueant/branch/master/graph/badge.svg)](https://codecov.io/gh/AustralianAntarcticDivision/blueant?branch=master)
[![Project Status: Active - The project has reached a stable, usable state and is being actively developed.](http://www.repostatus.org/badges/latest/active.svg)](http://www.repostatus.org/#active)
<!-- badges: end -->
# Blueant
Blueant provides a set of data source configurations to use with the [bowerbird](https://github.com/AustralianAntarcticDivision/bowerbird) package. These data sources are themed around Antarctic and Southern Ocean data, and include a range of oceanographic, meteorological, topographic, and other environmental data sets. Blueant will allow you to download data from these external data providers to your local file system, and to keep that data collection up to date.
## Installing
```{r eval=FALSE}
## Download and install blueant in R
install.packages("blueant", repos = c(SCAR = "https://scar.r-universe.dev",
CRAN = "https://cloud.r-project.org"))
## or
## install.packages("remotes") ## if needed
remotes::install_github("AustralianAntarcticDivision/blueant")
```
## Usage overview
### Configuration
Build up a configuration by first defining global options such as the destination on your local file system. Usually you would choose this destination data directory to be a persistent location, suitable for a data library. For demonstration purposes here we'll just use a temporary directory:
```{r include = FALSE}
## code not shown in the README: (1) use load_all() instead of library() for convenience, (2) use anonymous "temporary" dir, and (3) hide the progress bar
##library(blueant)
devtools::load_all()
my_data_dir <- "/tmp/data"
if (!dir.exists(my_data_dir)) dir.create(my_data_dir, recursive = TRUE)
if (dir.exists(file.path(my_data_dir, "data.aad.gov.au/eds/file/4494"))) unlink(file.path(my_data_dir, "data.aad.gov.au/eds/file/4494"), recursive = TRUE)
cf <- bb_config(local_file_root = my_data_dir)
mysrc <- sources("George V bathymetry") %>% bb_modify_source(method = list(show_progress = FALSE))
cf <- cf %>% bb_add(mysrc)
```
```{r eval = FALSE}
library(blueant)
my_data_dir <- tempdir()
cf <- bb_config(local_file_root = my_data_dir)
```
Add data sources from those provided by blueant. A summary of these sources is given at the end of this document. Here we'll use the "George V bathymetry" data source as an example:
```{r eval = FALSE}
mysrc <- sources("George V bathymetry")
cf <- cf %>% bb_add(mysrc)
```
This data source is fairly small (around 200MB, see `mysrc$collection_size`). Be sure to check the `collection_size` parameter of your chosen data source before running the synchronization. Some of these collections are quite large (see the summary table at the bottom of this document).
### Synchronization
Once the configuration has been defined and the data source added to it, we can run the sync process. We set `verbose = TRUE` here so that we see additional progress output:
```{r eval = FALSE}
status <- bb_sync(cf, verbose = TRUE)
```
```{r echo = FALSE}
## actually run this one so we don't get asked for confirmation
status <- bb_sync(cf, verbose = TRUE, confirm_downloads_larger_than = -1)
```
Congratulations! You now have your own local copy of this data set. The files in this data set have been stored in a data-source-specific subdirectory of our local file root, with details given by the returned `status` object:
```{r}
myfiles <- status$files[[1]]
myfiles
```
The data sources provided by blueant can be read, manipulated, and plotted using a range of other R packages, including [RAADTools](https://github.com/AustralianAntarcticDivision/raadtools) and [raster](https://cran.r-project.org/package=raster). In this case the data files are netcdf, which can be read by `raster`:
```{r rasterplot, message = FALSE, warning = FALSE}
library(raster)
x <- raster(myfiles$file[grepl("gvdem500m_v3", myfiles$file)])
plot(x)
```
## Nuances
### Choosing a data directory
It's up to you where you want your data collection kept, and to provide that location to bowerbird. A common use case for bowerbird is maintaining a central data collection for multiple users, in which case that location is likely to be some sort of networked file share. However, if you are keeping a collection for your own use, you might like to look at https://github.com/r-lib/rappdirs to help find a suitable directory location.
### Authentication
Some data providers require users to log in. These are indicated by the `authentication_note` column in the configuration table. For these sources, you will need to provide your user name and password, e.g.:
```{r eval=FALSE}
src <- sources(name="CMEMS global gridded SSH reprocessed (1993-ongoing)")
src$user <- "yourusername"
src$password <- "yourpassword"
cf <- bb_add(cf, src)
## or, using the pipe operator
mysrc <- bb_example_sources("CMEMS global gridded SSH reprocessed (1993-ongoing)") %>%
bb_modify_source(user = "yourusername", password = "yourpassword")
cf <- cf %>% bb_add(mysrc)
```
### Writing and modifying data sources
The bowerbird documentation is a good place to start to find out more about writing your own data sources or modifying existing ones.
#### Reducing download sizes
Sometimes you might only want part of a data collection. Perhaps you only want a few years from a long-term collection, or perhaps the data are provided in multiple formats and you only need one. If the data source uses the `bb_handler_rget` method, you can restrict what is downloaded by modifying the arguments passed through the data source's `method` parameter, particularly the `accept_follow`, `reject_follow`, `accept_download`, and `reject_download` options.
For example, the CERSAT SSM/I sea ice concentration data are arranged in yearly directories, and so it is fairly easy to restrict ourselves to, say, only the 2017 data:
```{r eval=FALSE}
mysrc <- sources("CERSAT SSM/I sea ice concentration")
## first make sure that the data source doesn't already have an accept_follow parameter defined
"accept_follow" %in% names(mysrc$method[[1]])
## nope, so we can safely go ahead and impose our own
mysrc$method[[1]]$accept_follow <- "/2017"
cf <- cf %>% bb_add(mysrc)
```
Alternatively, for data sources that are divided into subdirectories, one could replace the whole-data-source `source_url` with one or more that point to specific yearly (or other) subdirectories. For example, the default `source_url` for the CERSAT sea ice data above is `ftp://ftp.ifremer.fr/ifremer/cersat/products/gridded/psi-concentration/data/antarctic/daily/netcdf/` (which has yearly subdirectories). So e.g. for 2016 and 2017 data we could do:
```{r eval=FALSE}
mysrc <- sources("CERSAT SSM/I sea ice concentration")
mysrc$source_url[[1]] <- c(
"ftp://ftp.ifremer.fr/ifremer/cersat/products/gridded/psi-concentration/data/antarctic/daily/netcdf/2016/",
"ftp://ftp.ifremer.fr/ifremer/cersat/products/gridded/psi-concentration/data/antarctic/daily/netcdf/2017/")
cf <- cf %>% bb_add(mysrc)
```
#### Defining new data sources
If the blueant data sources don't cover your needs, you can define your own using the `bb_source` function. See the bowerbird documentation.
## Data source summary
These are the data source definitions that are provided as part of the blueant package.
```{r sources_summary, echo = FALSE, message = FALSE, warning = FALSE, results = "asis"}
library(blueant) ##devtools::load_all("../bowerbird"); devtools::load_all()
##library(DT)
cf <- bb_config("/your/data/root/") %>% bb_add(sources())
##cfsum <- do.call(rbind, lapply(seq_len(nrow(cf$data_sources)), function(z) cf$data_sources[z, c("data_group", "name", "description", "authentication_note", "collection_size", "doc_url")]))
##cfsum <- setNames(cfsum[order(cfsum$data_group), ], c("Data group", "Data source name", "Description", "Authentication note", "Approximate size (GB)", "Documentation link"))
##datatable(cfsum, rownames = FALSE, options = list(
## autoWidth = TRUE, columnDefs = list(list(className = "dt-head-left", targets = "_all")))) %>% formatStyle(1:6, "vertical-align" = "top") %>% formatStyle(1:6, "text-align" = "left")
sf <- bb_summary(cf, format = "rmd", inc_license = FALSE, inc_access_function = FALSE, inc_path = FALSE)
stxt <- readLines(sf)
stxt <- stxt[(grep("Last updated:", stxt) + 1):length(stxt)]
stxt <- gsub("^#", "##", stxt) ## push each header level down one
stxt <- gsub("^\\-", "\n-", stxt)
for (k in stxt) cat(k, "\n")
```