Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial switch to httr2 #738

Merged
merged 33 commits into from
Nov 25, 2024
Merged

Initial switch to httr2 #738

merged 33 commits into from
Nov 25, 2024

Conversation

ldecicco-USGS
Copy link
Collaborator

No description provided.

@ldecicco-USGS
Copy link
Collaborator Author

@jesse-ross @ehinman @mikejohnson51 @dblodgett-usgs @mikemahoney218-usgs
This is a pretty big PR that switches dataRetrieval from httr to httr2. I know a lot of you have been using httr2 for awhile. We're going to have some big changes to the USGS services in the next year or so, so I wanted to set things up so we could dive right in with the more modern package.

A fair number of tests were updated, but mostly it was URL attributes that the arguments got shifted around (so, same URL, different order).

If any of you have any comments, feedback, questions, let me know. I think generally everything looks to be behaving the same, except for a few cases where we had non-200 results returning without error. At the moment, those cases now automatically error. Maybe that should be the case (I think a few of our old behaviors were mostly because the service would come back with useful information even though it was a 400 or something like that). Anyway, if it's a problem, we can add a tryCatch, but I kind of think this is a more expected behavior anyway.

We're not planning an update to CRAN anytime soon. I expect this will stay in the "develop" branch for awhile.

Copy link

@jzemmels jzemmels left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These httr2 changes look good. I walked through each of the changed files and ran at least some of the examples for each function with the interactive debugger. I also ran R CMD CHECK and didn't run into issues.

I've made some comments, but I don't think any are required to address in this PR. Approved.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't comment on the line, but the @return field could be more precise. As it reads now, it sounds like the returned object is a character string object. Maybe something like the following:
@return An HTTP request URL: an S3 list with class httr2_request.

#' custom user agent, create an environmental variable: CUSTOM_DR_UA
#' This function accepts a url parameter, and returns the raw data.
#'
#' To add a custom user agent, create an environmental variable: CUSTOM_DR_UA
#'
#' @param obs_url character containing the url for the retrieval
#' @param \dots information to pass to header request

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, I can't comment on the @return field but perhaps the specific data type can be mentioned for getWebServiceData. Is it always going to be an xml_document?
@return xml_document object containing raw data from web services

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As it is written now, it would be XML for WaterML calls and characters for everything else (double checking that now). I believe since the WQP stopped recommending (or allowing in the new services) zip, we aren't returning raw results either. ANYWHO, this is a pattern I do really want to reconsider as we move into the new APIs.

Side note...not sure why you can't comment on the specific lines....I'll double check permissions....

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. Regarding commenting, I don't think GitHub allows comments on unchanged lines (thread)

} else if (is.raw(input)) {
returnedDoc <- xml2::read_xml(input)
raw <- TRUE
}

response <- xml2::xml_name(returnedDoc)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likely out of scope for this PR, but this line throws an error when input is a URL as in the doc example. Perhaps a character input could be cast as a httr2_request object earlier on? This seemed to work for me:

obs_url <- paste("https://cida.usgs.gov/ngwmn_cache/sos?request=GetObservation",
  "service=SOS", "version=2.0.0",
  "observedProperty=urn:ogc:def:property:OGC:GroundWaterLevel",
  "responseFormat=text/xml",
  "featureOfInterest=VW_GWDP_GEOSERVER.USGS.403836085374401",
  sep = "&"
)
importNGWMN(httr2::request(obs_url))

#'
#' timesereies <- importWaterML2(URL, asDateTime = TRUE, tz = "UTC")
#' timesereies <- importWaterML2(baseURL, asDateTime = TRUE, tz = "UTC")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small typo and baseURL needs to be converted to a httr2_request object

Suggested change
#' timesereies <- importWaterML2(baseURL, asDateTime = TRUE, tz = "UTC")
#' timeseries <- importWaterML2(httr2::request(baseURL), asDateTime = TRUE, tz = "UTC")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't comment on the line, but line 3 should have measured instead of meaured.

R/whatWQPsites.R Outdated

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line 11 has an incomplete sentence. Also, I'd change the @return tag on line 31 to something slightly more descriptive, although it'd be great to provide more information on the difference between samples vs metrics vs sites

@return data frame containing basic metadata on WQP sites

DESCRIPTION Outdated
@@ -39,14 +39,14 @@ Copyright: This software is in the public domain because it contains materials
Depends:
R (>= 3.5.0)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For what it's worth, httr2 (and the tidyverse) now require R >= 4.0, and in the spring will require R >= 4.1 . If you bump this to 4.1 (which, depending when you're planning on launching things, will be the minimum version to use the package anyway) then you can use the base pipe, which might make some of the query-building more fluid

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🥳I've been avoiding the 4.1 requirement, but if httr2 has it already....HELLO PIPES

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could also re-export the magrittr pipe from httr2 (either directly or by adding magrittr as a direct dependency) if this is going out before the spring. Here's how httr2 does it:
https://github.com/r-lib/httr2/blob/main/R/utils-pipe.R

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When magrittr first came out, we very firmly came to the conclusion that we would never use it within a package, so I've very much avoided re-exporting it. However, my understanding is that the base R pipe does a better job of allowing the "traceback" when an error occurs. So, I'm happy to start using native pipes - but would not introduce magittr at this stage of the game.

format = "rdb"
)
url <- httr2::req_url_query(baseURL,
site_no = siteNumbers,.multi = "comma")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
site_no = siteNumbers,.multi = "comma")
site_no = siteNumbers, .multi = "comma")

Sorry, this is such a small nit but I saw it 😆

R/findNLDI.R Outdated
#' get_nldi(paste0(base, "comid/101"), type = "feature", use_sf = TRUE)
#' get_nldi(url = paste0(base, "nwissite/USGS-11120000"), type = "feature", use_sf = TRUE)
#' get_nldi(paste0(base, "nwissite/USGS-11120000"), type = "feature", use_sf = TRUE)
#' dataRetrieval:::get_nldi(paste0(base, "comid/101"), type = "feature", use_sf = FALSE)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think CRAN gets mad at ::: even if it's never actually run -- believe this gets caught by their automated checks. Maybe not an issue due to @noRd, but I'm not 100% sure.

If it is an issue, the workaround is to use getFromNamespace() to assign the non-exported object into the current environment, which defeats the static analysis checks. (If it's not an issue with CRAN, well, sorry for the distraction!)

@@ -132,7 +115,7 @@ check_non_200s <- function(returnedList){
default_ua <- function() {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have I mentioned how much I like this, by the way? This approach to tagging user agents is great

R/whatWQPdata.R Outdated
Comment on lines 46 to 49
sites <- values[["siteid"]]
sites <- paste0(sites, collapse = ";")
baseURL <- httr2::req_url_query(baseURL,
siteid = sites)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
sites <- values[["siteid"]]
sites <- paste0(sites, collapse = ";")
baseURL <- httr2::req_url_query(baseURL,
siteid = sites)
baseURL <- httr2::req_url_query(baseURL,
siteid = values[["siteid"]],
.multi = function(x) paste0(x, collapse = ";"))

I don't know if that's actually any more clear, but this could be handled inside of the httr2 function.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Didn't notice we could add custom functinos.

@mikemahoney218-usgs
Copy link

This looks awesome 😄 Left a few minor comments (apologies everyone for the email spam -- I didn't actually intend to review the full PR tonight 😆) but this is great.

@dblodgett-usgs
Copy link
Contributor

🤗

@mikemahoney218-usgs
Copy link

mikemahoney218-usgs commented Nov 25, 2024

Not sure if you already know this, but just in case -- that qpdf error is an error in Homebrew, and (according to Gabor) will fix itself when the next version of the GHA runner image is deployed (but no ETA for when that will be finished)

@ldecicco-USGS
Copy link
Collaborator Author

Unfortunately we're not allowed to call github actions by a version (like v2 which the r-lib folks really want us to use), so we need to occasionally include the actual commit hash in the GH action yaml file. It makes sense from a security standpoint - since the version tag is sliding - they could any anything in there and we wouldn't know.

At one point I had a script that would pull the latest commit hash from a version number, but I cannot find that anymore...if anyone on this list remembers how to do that, it would save a bit of fiddling when these things need to be updated.

@ldecicco-USGS ldecicco-USGS merged commit 35a957f into DOI-USGS:develop Nov 25, 2024
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants