From a6463bcaff496a8f634f12b94f97fb81c1087a8a Mon Sep 17 00:00:00 2001 From: Andy Teucher Date: Wed, 10 Apr 2024 09:03:05 -0700 Subject: [PATCH 1/6] Start to update how to find data in R --- how-tos/find-data/find-r.qmd | 108 ++++++++++++++++++++++++++++++++--- 1 file changed, 99 insertions(+), 9 deletions(-) diff --git a/how-tos/find-data/find-r.qmd b/how-tos/find-data/find-r.qmd index 82e937de4..dc551b6d0 100644 --- a/how-tos/find-data/find-r.qmd +++ b/how-tos/find-data/find-r.qmd @@ -6,21 +6,111 @@ execute: Here are our recommended approaches for finding data with R, from the command line or a notebook. -To find data in R, we'll also use the `earthaccess` python package - we can do so from R using the [`reticulate`](https://rstudio.github.io/reticulate/) package ([cheatsheet](https://www.rstudio.com/resources/cheatsheets/)). Note below that we import the python library as an R object we name `earthaccess`, as well as the `earthaccess$` syntax for accessing functions from the `earthaccess` library. The `granules` object has a list of JSON dictionaries with some extra dictionaries. +## Authentication for NASA Earthdata + +An Earthdata Login account is required to access data from the NASA Earthdata system. Please visit to register and manage your Earthdata Login account. This account is free to create and only takes a moment to set up. + +The first step is to get the correct authentication that will allow us to get cloud-hosted data from NASA. This is all done through Earthdata Login. We can use the `earthdatalogin` R package +to help us manage this. + +The easiest and most portable method of access is using the netrc basic +authentication protocol for HTTP. Call `edl_netrc()` to set this up +given your username and password (passed as optional arguments or read +from the environmental variables. If neither provides credentials, +`earthdatalogin` will provide it’s own credentials, but you may +experience rate limits more readily). + +Once you have created an account on Earthdata, the easiest way to store your +credentials is by setting them as environment variables in your `~/.Renviron` file. +The usethis package has a handy function to find and open this file for editing: ```r +usethis::edit_r_environ() +``` + +Then add the following lines to the file, save the file, and restart R + +``` +EARTHDATA_USER="your_user_name" +EARTHDATA_PASSWORD="your_password" +``` + +Now, when you call `edl_netrc()`, it will consult these environment variables +to create a netrc file that will be used to authenticate with the NASA Earthdata +servers. + + +``` r +edl_netrc() +``` + +If `edl_netrc()` has been called, then most existing spatial data +packages in R can then seamlessly access NASA Earthdata over HTTP links. + +## Searching for data + + +The NASA cloud data is searchable programmatically via two methods - NASA's own search service, and the NASA Spatio-Temporal Asset Catalog (STAC) API. To find data in R, we'll mainly rely on the [rstac](https://brazil-data-cube.github.io/rstac/) package. This enables us interact with the NASA [STAC](https://stacspec.org/en) API to search for our data, and at the same time learn about STAC more generally. This will be useful as it is a common standard for distributing spatial data. + +We will also search for data using the [NASA Earthdata search API](https://cmr.earthdata.nasa.gov/search/site/docs/search/api.html), which is the service that powers the [Earthdata web search portal](https://search.earthdata.nasa.gov/search?fst0=atmosphere). + +For both of these services, the [earthdatalogin](https://boettiger-lab.github.io/earthdatalogin/) package will help us get set up and provide a few handy functions +to smooth some rough edges. + +## rstac + +All of the NASA STAC catalogues can be viewd here: https://radiantearth.github.io/stac-browser/#/external/cmr.earthdata.nasa.gov/stac/. + +We will use the rstac package to first browse the collections in the PO DAAC +catalogue (POCLOUD): + +```{r} ## In R ## load R libraries -library(tidyverse) # install.packages("tidyverse") -library(reticulate) # install.packages("reticulate") +# install.packages("pak") +# pak::pak(c("tidyverse", "rstac", "boettiger-lab/earthdatalogin")) + +library(rstac) +library(earthdatalogin) + +po_collections <- stac("https://cmr.earthdata.nasa.gov/stac/POCLOUD/") |> + collections() |> + get_request() + +po_collections +``` + +This only gives us the first 10 collections in the catalogue; to get the rest +we can use the `collections_fetch()` function: + +```{r} +all_po_collections <- collections_fetch(po_collections) +all_po_collections + +length(all_po_collections$collections) + +head(all_po_collections$collections) +``` + +Once we have searched through the collections, we can choose the one we are +interested in and query it to get the items (granules) we are interested in: + +```{r} +items <- stac("https://cmr.earthdata.nasa.gov/stac/POCLOUD") |> + stac_search(collections = "MUR-JPL-L4-GLOB-v4.1", + datetime = paste(start,end, sep = "/")) |> + post_request() |> + items_fetch() + +items +``` + + -## load python library -earthaccess <- reticulate::import("earthaccess") -## use earthaccess to access data # https://nsidc.github.io/earthaccess/tutorials/search-granules/ -granules <- earthaccess$search_data( - doi = "10.5067/SLREF-CDRV3", - temporal = reticulate::tuple("2017-01", "2017-02") # with an earthaccess update, this can be simply c() or list() +```{r} +granules <- edl_search( + doi = URLencode("10.5067/SLREF-CDRV3", reserved = TRUE) ) ## Granules found: 72 From 47d8b37931f47f1f599aa7c4f67133e5b0821369 Mon Sep 17 00:00:00 2001 From: Andy Teucher Date: Wed, 10 Apr 2024 12:11:55 -0700 Subject: [PATCH 2/6] tweak auth section --- how-tos/find-data/find-r.qmd | 67 +++++++++++++++++++++--------------- 1 file changed, 39 insertions(+), 28 deletions(-) diff --git a/how-tos/find-data/find-r.qmd b/how-tos/find-data/find-r.qmd index dc551b6d0..128f437cc 100644 --- a/how-tos/find-data/find-r.qmd +++ b/how-tos/find-data/find-r.qmd @@ -8,58 +8,69 @@ Here are our recommended approaches for finding data with R, from the command li ## Authentication for NASA Earthdata -An Earthdata Login account is required to access data from the NASA Earthdata system. Please visit to register and manage your Earthdata Login account. This account is free to create and only takes a moment to set up. - -The first step is to get the correct authentication that will allow us to get cloud-hosted data from NASA. This is all done through Earthdata Login. We can use the `earthdatalogin` R package -to help us manage this. - -The easiest and most portable method of access is using the netrc basic -authentication protocol for HTTP. Call `edl_netrc()` to set this up -given your username and password (passed as optional arguments or read -from the environmental variables. If neither provides credentials, -`earthdatalogin` will provide it’s own credentials, but you may -experience rate limits more readily). - -Once you have created an account on Earthdata, the easiest way to store your -credentials is by setting them as environment variables in your `~/.Renviron` file. +An Earthdata Login account is required to access data from the NASA Earthdata +system. Please visit to register and manage +your Earthdata Login account. This account is free to create and only takes a +moment to set up. + +Once you have created your Earthdata Login account, you can use the +`earthdatalogin` R package to help you manage your authentication within R. + +The easiest and most portable method of access is using the `netrc` basic +authentication protocol for HTTP. Call `edl_netrc()` to set this up given your +username and password. The easiest way to store your credentials for use in +`edl_netrc()` is to set them as environment variables in your `~/.Renviron` file. The usethis package has a handy function to find and open this file for editing: ```r usethis::edit_r_environ() ``` -Then add the following lines to the file, save the file, and restart R +Then add the following lines to the file, save the file, and restart R. ``` EARTHDATA_USER="your_user_name" EARTHDATA_PASSWORD="your_password" ``` -Now, when you call `edl_netrc()`, it will consult these environment variables -to create a netrc file that will be used to authenticate with the NASA Earthdata -servers. - +Now, when you call `edl_netrc()`, it will consult these environment variables to +create a `.netrc` file that will be used to authenticate with the NASA Earthdata +services. If you don't have your credentials saved as environment variables, +`earthdatalogin` will provide it’s own credentials, but you may experience rate +limits more readily. You can also manually type in your credentials to the `username` +and `password` arguments to `edl_netrc()` but this is not recommended as it +is too easy to accidentally share these in your code. ``` r edl_netrc() ``` -If `edl_netrc()` has been called, then most existing spatial data -packages in R can then seamlessly access NASA Earthdata over HTTP links. +Once `edl_netrc()` has been called in your R session, then most existing spatial +data packages in R can seamlessly access NASA Earthdata over HTTP links. ## Searching for data +The NASA cloud data is searchable programmatically via two methods - NASA's own +search service, and the NASA Spatio-Temporal Asset Catalog (STAC) API. To find +data in R, we'll mainly rely on the +[rstac](https://brazil-data-cube.github.io/rstac/) package. This enables us +interact with the NASA [STAC](https://stacspec.org/en) API to search for our +data, and at the same time learn about STAC more generally. This will be useful +as it is a common standard for distributing spatial data. -The NASA cloud data is searchable programmatically via two methods - NASA's own search service, and the NASA Spatio-Temporal Asset Catalog (STAC) API. To find data in R, we'll mainly rely on the [rstac](https://brazil-data-cube.github.io/rstac/) package. This enables us interact with the NASA [STAC](https://stacspec.org/en) API to search for our data, and at the same time learn about STAC more generally. This will be useful as it is a common standard for distributing spatial data. - -We will also search for data using the [NASA Earthdata search API](https://cmr.earthdata.nasa.gov/search/site/docs/search/api.html), which is the service that powers the [Earthdata web search portal](https://search.earthdata.nasa.gov/search?fst0=atmosphere). +We will also search for data using the +[NASA Earthdata search API](https://cmr.earthdata.nasa.gov/search/site/docs/search/api.html), +which is the service that powers the +[Earthdata web search portal](https://search.earthdata.nasa.gov/search?fst0=atmosphere). -For both of these services, the [earthdatalogin](https://boettiger-lab.github.io/earthdatalogin/) package will help us get set up and provide a few handy functions -to smooth some rough edges. +For both of these services, the +[earthdatalogin](https://boettiger-lab.github.io/earthdatalogin/) package will +help us get set up and provide a few handy functions to smooth some rough edges. -## rstac +### rstac -All of the NASA STAC catalogues can be viewd here: https://radiantearth.github.io/stac-browser/#/external/cmr.earthdata.nasa.gov/stac/. +All of the NASA STAC catalogues can be viewed here: +. We will use the rstac package to first browse the collections in the PO DAAC catalogue (POCLOUD): From 132bfcbed54432d647001f23930540ebf8017dd9 Mon Sep 17 00:00:00 2001 From: Andy Teucher Date: Tue, 16 Apr 2024 10:19:05 -0700 Subject: [PATCH 3/6] Add info about dev edl, start work on subsetting --- how-tos/find-data/find-r.qmd | 42 +++++++++++++++++++++++------------- 1 file changed, 27 insertions(+), 15 deletions(-) diff --git a/how-tos/find-data/find-r.qmd b/how-tos/find-data/find-r.qmd index 128f437cc..87fa8d634 100644 --- a/how-tos/find-data/find-r.qmd +++ b/how-tos/find-data/find-r.qmd @@ -14,7 +14,15 @@ your Earthdata Login account. This account is free to create and only takes a moment to set up. Once you have created your Earthdata Login account, you can use the -`earthdatalogin` R package to help you manage your authentication within R. +[earthdatalogin](https://github.com/boettiger-lab/earthdatalogin/) R package to help you manage your authentication within R. + +There is some functionality currently only available in the development version +of earthdatalogin, so we will install it from GitHub using the [pak](https://pak.r-lib.org/) package: + +``` r +install.packages("pak") +pak::pak("boettiger-lab/earthdatalogin") +``` The easiest and most portable method of access is using the `netrc` basic authentication protocol for HTTP. Call `edl_netrc()` to set this up given your @@ -23,6 +31,7 @@ username and password. The easiest way to store your credentials for use in The usethis package has a handy function to find and open this file for editing: ```r +# install.packages("usethis") usethis::edit_r_environ() ``` @@ -36,17 +45,19 @@ EARTHDATA_PASSWORD="your_password" Now, when you call `edl_netrc()`, it will consult these environment variables to create a `.netrc` file that will be used to authenticate with the NASA Earthdata services. If you don't have your credentials saved as environment variables, -`earthdatalogin` will provide it’s own credentials, but you may experience rate +`earthdatalogin` will provide its own credentials, but you may experience rate limits more readily. You can also manually type in your credentials to the `username` and `password` arguments to `edl_netrc()` but this is not recommended as it is too easy to accidentally share these in your code. -``` r +```r +library(earthdatalogin) + edl_netrc() ``` -Once `edl_netrc()` has been called in your R session, then most existing spatial -data packages in R can seamlessly access NASA Earthdata over HTTP links. +Once `edl_netrc()` has been called in your R session, then most spatial +packages in R can seamlessly access NASA Earthdata over HTTP links. ## Searching for data @@ -101,12 +112,19 @@ all_po_collections length(all_po_collections$collections) head(all_po_collections$collections) + +sapply(all_po_collections$collections, `[[`, "title") +grep("MUR-JPL", sapply(all_po_collections$collections, `[[`, "id"), value = TRUE) ``` Once we have searched through the collections, we can choose the one we are interested in and query it to get the items (granules) we are interested in: ```{r} + +start <- "2020-01-01" +end <- "2020-01-03" + items <- stac("https://cmr.earthdata.nasa.gov/stac/POCLOUD") |> stac_search(collections = "MUR-JPL-L4-GLOB-v4.1", datetime = paste(start,end, sep = "/")) |> @@ -120,16 +138,10 @@ items ```{r} -granules <- edl_search( - doi = URLencode("10.5067/SLREF-CDRV3", reserved = TRUE) -) - -## Granules found: 72 - -## exploring -granules # this is the result of the get request. +library(terra) +urls <- edl_search(short_name = "MUR-JPL-L4-GLOB-v4.1", + temporal = c(start, end)) -class(granules) # "list" -## granules <- reticulate::py_to_r(granules) # Object to convert is not a Python object +terra::rast(paste0("/vsicurl/", urls[1])) ``` From d2f7d60d04633bb35700aee847282c22182b6d34 Mon Sep 17 00:00:00 2001 From: Andy Teucher Date: Tue, 16 Apr 2024 17:42:39 -0700 Subject: [PATCH 4/6] More materials using terra and stars --- how-tos/find-data/find-r.qmd | 142 ++++++++++++++++++++++++++++------- 1 file changed, 113 insertions(+), 29 deletions(-) diff --git a/how-tos/find-data/find-r.qmd b/how-tos/find-data/find-r.qmd index 87fa8d634..d52f659b6 100644 --- a/how-tos/find-data/find-r.qmd +++ b/how-tos/find-data/find-r.qmd @@ -6,6 +6,29 @@ execute: Here are our recommended approaches for finding data with R, from the command line or a notebook. +### Using the web interface + +See [**How do I find data using Earthdata Search?**](earthdata_search.md) - in that tutorial, we found the dataset *ECCO Sea Surface Height - Monthly Mean 0.5 Degree (Version 4 Release 4)*, with the shortname `ECCO_L4_SSH_05DEG_MONTHLY_V4R4`. + +### Searching programmatically + +The NASA cloud data is searchable programmatically via two methods - NASA's own +search service, and the NASA Spatio-Temporal Asset Catalog (STAC) API. To find +data in R, we'll mainly rely on the +[rstac](https://brazil-data-cube.github.io/rstac/) package. This enables us +interact with the NASA [STAC](https://stacspec.org/en) API to search for our +data, and at the same time learn about STAC more generally. This will be useful +as it is a common standard for distributing spatial data. + +We will also search for data using the +[NASA Earthdata search API](https://cmr.earthdata.nasa.gov/search/site/docs/search/api.html), +which is the service that powers the +[Earthdata web search portal](https://search.earthdata.nasa.gov/search). + +For both of these services, the +[earthdatalogin](https://boettiger-lab.github.io/earthdatalogin/) package will +help us get set up and provide a few handy functions to smooth some rough edges. + ## Authentication for NASA Earthdata An Earthdata Login account is required to access data from the NASA Earthdata @@ -50,7 +73,7 @@ limits more readily. You can also manually type in your credentials to the `user and `password` arguments to `edl_netrc()` but this is not recommended as it is too easy to accidentally share these in your code. -```r +```{r} library(earthdatalogin) edl_netrc() @@ -59,26 +82,7 @@ edl_netrc() Once `edl_netrc()` has been called in your R session, then most spatial packages in R can seamlessly access NASA Earthdata over HTTP links. -## Searching for data - -The NASA cloud data is searchable programmatically via two methods - NASA's own -search service, and the NASA Spatio-Temporal Asset Catalog (STAC) API. To find -data in R, we'll mainly rely on the -[rstac](https://brazil-data-cube.github.io/rstac/) package. This enables us -interact with the NASA [STAC](https://stacspec.org/en) API to search for our -data, and at the same time learn about STAC more generally. This will be useful -as it is a common standard for distributing spatial data. - -We will also search for data using the -[NASA Earthdata search API](https://cmr.earthdata.nasa.gov/search/site/docs/search/api.html), -which is the service that powers the -[Earthdata web search portal](https://search.earthdata.nasa.gov/search?fst0=atmosphere). - -For both of these services, the -[earthdatalogin](https://boettiger-lab.github.io/earthdatalogin/) package will -help us get set up and provide a few handy functions to smooth some rough edges. - -### rstac +### Finding data in NASA STACs with rstac All of the NASA STAC catalogues can be viewed here: . @@ -106,6 +110,7 @@ This only gives us the first 10 collections in the catalogue; to get the rest we can use the `collections_fetch()` function: ```{r} +#| eval: false all_po_collections <- collections_fetch(po_collections) all_po_collections @@ -113,20 +118,22 @@ length(all_po_collections$collections) head(all_po_collections$collections) +# Just look at the titles: sapply(all_po_collections$collections, `[[`, "title") -grep("MUR-JPL", sapply(all_po_collections$collections, `[[`, "id"), value = TRUE) + +# Get shortnames from the 'id' field and search for a match: +grep("ECCO_L4_SSH", sapply(all_po_collections$collections, `[[`, "id"), value = TRUE) ``` Once we have searched through the collections, we can choose the one we are interested in and query it to get the items (granules) we are interested in: ```{r} - -start <- "2020-01-01" -end <- "2020-01-03" +start <- "2015-01-01" +end <- "2015-12-31" items <- stac("https://cmr.earthdata.nasa.gov/stac/POCLOUD") |> - stac_search(collections = "MUR-JPL-L4-GLOB-v4.1", + stac_search(collections = "ECCO_L4_SSH_05DEG_MONTHLY_V4R4", datetime = paste(start,end, sep = "/")) |> post_request() |> items_fetch() @@ -134,14 +141,91 @@ items <- stac("https://cmr.earthdata.nasa.gov/stac/POCLOUD") |> items ``` +There are 13 'items' representing the same 13 granules [we found when we searched using EarthData Search](https://search.earthdata.nasa.gov/search/granules?p=C1990404799-POCLOUD&pg%5B0%5D%5Bv%5D=f&pg%5B0%5D%5Bgsk%5D=-start_date&q=ECCO%20monthly%20SSH&qt=2015-01-01T00:00:00.000Z,2015-12-31T23:59:59.999Z&ff=Available%20in%20Earthdata%20Cloud&tl=1713291266.503!3!!&long=0.0703125). + +We can see more details of the items (granules) by printing the `features` list +item: + +```{r} +items$features + +# And the urls: +edl_stac_urls(items) +``` +### Finding data using earthdatalogin +Once we know the shortname of a collection (usually by looking in the EarthData Search portal), we can supply it to `edl_search()` to get the metadata and file urls of the individual granules: + +```{r} +granules <- edl_search( + short_name = "ECCO_L4_SSH_05DEG_MONTHLY_V4R4", + temporal = c(start, end), + parse_results = FALSE +) + +granules + +granules[[1]] + +sapply(granules, `[`, "title") + +# Note these are the same urls obtained via the rstac workflow demonstrated above +granule_urls <- edl_extract_urls(granules) +granule_urls +``` + +#### Using the `{terra}` package + +We can read any of these urls using `terra::rast()`. We supply the `vsi = TRUE` +argument, which prepends `"/vsicurl/"` to the url, indicating that the file +should be opened as a "virtual" remote dataset. This allows random partial +reading of files without prior download of the entire file. This will vastly speed up +most operations as only the subset of the data required is ever actually downloaded. ```{r} library(terra) -urls <- edl_search(short_name = "MUR-JPL-L4-GLOB-v4.1", - temporal = c(start, end)) -terra::rast(paste0("/vsicurl/", urls[1])) +rast <- terra::rast(granule_urls[1], vsi = TRUE) + +# This does not come with an extent and CRS so we can supply it manually +granules[[1]]$boxes +ext(rast) <- c(-180, 180, -90, 90) +crs(rast) <- "EPSG:4326" + +plot(rast[["SSH"]]) + +# We notice that this plot is upside-down - it is likely that the NetCDF file +# does not conform properly to the conventions. But we can flip it: +rast <- flip(rast, direction = "vertical") +plot(rast) +``` + +If we want to crop the raster, we can define an extent and crop it to that area. +Because we previously called `edl_netrc()`, we are not only authenticated with +the server so we can access the data, but we have also set up `terra` and `stars` +to access the data in such a way that only the subset of the data +we have requested is actually downloaded. + +```{r} +crop_box <- rast(extent = c(-150, -120, 35, 60), crs = "EPSG:4326") + +rast_cropped <- crop(rast, crop_box) +plot(rast_cropped[["SSH"]]) ``` + +#### Using the `{stars}` package + +The `read_*` functions in stars do not have the `vsi` argument, but we can do +the same thing simply by prepending `"/vsicurl/"` to the url ourselves: + +```{r} +library(stars) +ssh_stars <- read_mdim(paste0("/vsicurl/", granule_urls[1])) + +plot(ssh_stars) +``` + +We can + From 542b6e35ed4146a3ec4758ec821baaf0eeba5db2 Mon Sep 17 00:00:00 2001 From: Andy Teucher Date: Wed, 17 Apr 2024 08:55:59 -0700 Subject: [PATCH 5/6] crop with stars --- how-tos/find-data/find-r.qmd | 13 ++++++++++--- 1 file changed, 10 insertions(+), 3 deletions(-) diff --git a/how-tos/find-data/find-r.qmd b/how-tos/find-data/find-r.qmd index d52f659b6..58ff19b0f 100644 --- a/how-tos/find-data/find-r.qmd +++ b/how-tos/find-data/find-r.qmd @@ -188,7 +188,7 @@ library(terra) rast <- terra::rast(granule_urls[1], vsi = TRUE) -# This does not come with an extent and CRS so we can supply it manually +# This does not come with an extent and CRS embedded so we can supply it manually granules[[1]]$boxes ext(rast) <- c(-180, 180, -90, 90) crs(rast) <- "EPSG:4326" @@ -218,7 +218,8 @@ plot(rast_cropped[["SSH"]]) #### Using the `{stars}` package The `read_*` functions in stars do not have the `vsi` argument, but we can do -the same thing simply by prepending `"/vsicurl/"` to the url ourselves: +the same thing simply by prepending `"/vsicurl/"` to the url ourselves. Here +we will use the `read_mdim()` function. ```{r} library(stars) @@ -227,5 +228,11 @@ ssh_stars <- read_mdim(paste0("/vsicurl/", granule_urls[1])) plot(ssh_stars) ``` -We can +We can again crop this using the same bounding box as we did with terra: + +```{r} +st_crop(ssh_stars, st_bbox(c(xmin = -150, xmax = -120, ymin = 35, ymax = 60))) |> + plot() +``` + From 21dfc609b206978c5adf7340dbfd4ed34b4c5ad8 Mon Sep 17 00:00:00 2001 From: Andy Teucher Date: Wed, 17 Apr 2024 09:22:10 -0700 Subject: [PATCH 6/6] Tidy up --- how-tos/find-data/find-r.qmd | 27 ++++++++++++++++++--------- 1 file changed, 18 insertions(+), 9 deletions(-) diff --git a/how-tos/find-data/find-r.qmd b/how-tos/find-data/find-r.qmd index 58ff19b0f..19ad26ac7 100644 --- a/how-tos/find-data/find-r.qmd +++ b/how-tos/find-data/find-r.qmd @@ -4,6 +4,9 @@ execute: eval: false --- +*This is a work in progress, and may be split up into different modules/tutorials +as we continue to work on it.* + Here are our recommended approaches for finding data with R, from the command line or a notebook. ### Using the web interface @@ -14,7 +17,7 @@ See [**How do I find data using Earthdata Search?**](earthdata_search.md) - in t The NASA cloud data is searchable programmatically via two methods - NASA's own search service, and the NASA Spatio-Temporal Asset Catalog (STAC) API. To find -data in R, we'll mainly rely on the +data in R, we'll mainly rely on the [rstac](https://brazil-data-cube.github.io/rstac/) package. This enables us interact with the NASA [STAC](https://stacspec.org/en) API to search for our data, and at the same time learn about STAC more generally. This will be useful @@ -29,7 +32,7 @@ For both of these services, the [earthdatalogin](https://boettiger-lab.github.io/earthdatalogin/) package will help us get set up and provide a few handy functions to smooth some rough edges. -## Authentication for NASA Earthdata +### Authentication for NASA Earthdata An Earthdata Login account is required to access data from the NASA Earthdata system. Please visit to register and manage @@ -110,7 +113,6 @@ This only gives us the first 10 collections in the catalogue; to get the rest we can use the `collections_fetch()` function: ```{r} -#| eval: false all_po_collections <- collections_fetch(po_collections) all_po_collections @@ -153,7 +155,7 @@ items$features edl_stac_urls(items) ``` -### Finding data using earthdatalogin +### Finding data in NASA EarthData Search using earthdatalogin Once we know the shortname of a collection (usually by looking in the EarthData Search portal), we can supply it to `edl_search()` to get the metadata and file urls of the individual granules: @@ -168,6 +170,7 @@ granules granules[[1]] +# See the granule titles sapply(granules, `[`, "title") # Note these are the same urls obtained via the rstac workflow demonstrated above @@ -175,7 +178,7 @@ granule_urls <- edl_extract_urls(granules) granule_urls ``` -#### Using the `{terra}` package +#### Accessing data using the `{terra}` package We can read any of these urls using `terra::rast()`. We supply the `vsi = TRUE` argument, which prepends `"/vsicurl/"` to the url, indicating that the file @@ -204,8 +207,7 @@ plot(rast) If we want to crop the raster, we can define an extent and crop it to that area. Because we previously called `edl_netrc()`, we are not only authenticated with the server so we can access the data, but we have also set up `terra` and `stars` -to access the data in such a way that only the subset of the data -we have requested is actually downloaded. +to us the underlying `GDAL` library to access the data in such a way that only \the subset of the data we have requested is actually downloaded. ```{r} crop_box <- rast(extent = c(-150, -120, 35, 60), crs = "EPSG:4326") @@ -215,7 +217,7 @@ rast_cropped <- crop(rast, crop_box) plot(rast_cropped[["SSH"]]) ``` -#### Using the `{stars}` package +#### Accessing data using the `{stars}` package The `read_*` functions in stars do not have the `vsi` argument, but we can do the same thing simply by prepending `"/vsicurl/"` to the url ourselves. Here @@ -231,8 +233,15 @@ plot(ssh_stars) We can again crop this using the same bounding box as we did with terra: ```{r} -st_crop(ssh_stars, st_bbox(c(xmin = -150, xmax = -120, ymin = 35, ymax = 60))) |> +st_crop( + ssh_stars, + st_bbox(c(xmin = -150, xmax = -120, ymin = 35, ymax = 60)) +) |> plot() ``` + +### Accessing data using gdalcubes + +*Coming soon!*