Initial switch to httr2 #738

ldecicco-USGS · 2024-11-05T17:13:40Z

No description provided.

Merge branch 'main' of https://github.com/DOI-USGS/dataRetrieval into httr2 # Conflicts: # R/constructNWISURL.R # man/constructNWISURL.Rd

… httr2

Merge branch 'main' of https://github.com/DOI-USGS/dataRetrieval into httr2 # Conflicts: # NEWS # R/getWebServiceData.R # man/getWebServiceData.Rd

ldecicco-USGS · 2024-11-05T22:14:43Z

@jesse-ross @ehinman @mikejohnson51 @dblodgett-usgs @mikemahoney218-usgs
This is a pretty big PR that switches dataRetrieval from httr to httr2. I know a lot of you have been using httr2 for awhile. We're going to have some big changes to the USGS services in the next year or so, so I wanted to set things up so we could dive right in with the more modern package.

A fair number of tests were updated, but mostly it was URL attributes that the arguments got shifted around (so, same URL, different order).

If any of you have any comments, feedback, questions, let me know. I think generally everything looks to be behaving the same, except for a few cases where we had non-200 results returning without error. At the moment, those cases now automatically error. Maybe that should be the case (I think a few of our old behaviors were mostly because the service would come back with useful information even though it was a 400 or something like that). Anyway, if it's a problem, we can add a tryCatch, but I kind of think this is a more expected behavior anyway.

We're not planning an update to CRAN anytime soon. I expect this will stay in the "develop" branch for awhile.

jzemmels

These httr2 changes look good. I walked through each of the changed files and ran at least some of the examples for each function with the interactive debugger. I also ran R CMD CHECK and didn't run into issues.

I've made some comments, but I don't think any are required to address in this PR. Approved.

jzemmels · 2024-11-15T22:30:53Z

R/constructNWISURL.R

I can't comment on the line, but the @return field could be more precise. As it reads now, it sounds like the returned object is a character string object. Maybe something like the following:
@return An HTTP request URL: an S3 list with class httr2_request.

jzemmels · 2024-11-15T22:56:03Z

R/getWebServiceData.R

-#' custom user agent, create an environmental variable: CUSTOM_DR_UA
+#' This function accepts a url parameter, and returns the raw data. 
+#' 
+#' To add a custom user agent, create an environmental variable: CUSTOM_DR_UA
 #'
 #' @param obs_url character containing the url for the retrieval
 #' @param \dots information to pass to header request


Again, I can't comment on the @return field but perhaps the specific data type can be mentioned for getWebServiceData. Is it always going to be an xml_document?
@return xml_document object containing raw data from web services

As it is written now, it would be XML for WaterML calls and characters for everything else (double checking that now). I believe since the WQP stopped recommending (or allowing in the new services) zip, we aren't returning raw results either. ANYWHO, this is a pattern I do really want to reconsider as we move into the new APIs.

Side note...not sure why you can't comment on the specific lines....I'll double check permissions....

Sounds good. Regarding commenting, I don't think GitHub allows comments on unchanged lines (thread)

jzemmels · 2024-11-15T23:15:09Z

R/importNGWMN_wml2.R

+  } else if (is.raw(input)) {
+    returnedDoc <- xml2::read_xml(input)
+    raw <- TRUE
+  } 

  response <- xml2::xml_name(returnedDoc)


Likely out of scope for this PR, but this line throws an error when input is a URL as in the doc example. Perhaps a character input could be cast as a httr2_request object earlier on? This seemed to work for me:

obs_url <- paste("https://cida.usgs.gov/ngwmn_cache/sos?request=GetObservation", "service=SOS", "version=2.0.0", "observedProperty=urn:ogc:def:property:OGC:GroundWaterLevel", "responseFormat=text/xml", "featureOfInterest=VW_GWDP_GEOSERVER.USGS.403836085374401", sep = "&" ) importNGWMN(httr2::request(obs_url))

jzemmels · 2024-11-15T23:24:09Z

R/importNGWMN_wml2.R

 #' 
-#' timesereies <- importWaterML2(URL, asDateTime = TRUE, tz = "UTC")
+#' timesereies <- importWaterML2(baseURL, asDateTime = TRUE, tz = "UTC")


Small typo and baseURL needs to be converted to a httr2_request object

Suggested change

#' timesereies <- importWaterML2(baseURL, asDateTime = TRUE, tz = "UTC")

#' timeseries <- importWaterML2(httr2::request(baseURL), asDateTime = TRUE, tz = "UTC")

jzemmels · 2024-11-18T17:33:15Z

R/readNWISpCode.R

Can't comment on the line, but line 3 should have measured instead of meaured.

jzemmels · 2024-11-18T18:37:03Z

R/whatWQPsites.R

Line 11 has an incomplete sentence. Also, I'd change the @return tag on line 31 to something slightly more descriptive, although it'd be great to provide more information on the difference between samples vs metrics vs sites

@return data frame containing basic metadata on WQP sites

mikemahoney218-usgs · 2024-11-22T22:22:55Z

DESCRIPTION

@@ -39,14 +39,14 @@ Copyright: This software is in the public domain because it contains materials
 Depends:
    R (>= 3.5.0)


For what it's worth, httr2 (and the tidyverse) now require R >= 4.0, and in the spring will require R >= 4.1 . If you bump this to 4.1 (which, depending when you're planning on launching things, will be the minimum version to use the package anyway) then you can use the base pipe, which might make some of the query-building more fluid

🥳I've been avoiding the 4.1 requirement, but if httr2 has it already....HELLO PIPES

You could also re-export the magrittr pipe from httr2 (either directly or by adding magrittr as a direct dependency) if this is going out before the spring. Here's how httr2 does it:
https://github.com/r-lib/httr2/blob/main/R/utils-pipe.R

When magrittr first came out, we very firmly came to the conclusion that we would never use it within a package, so I've very much avoided re-exporting it. However, my understanding is that the base R pipe does a better job of allowing the "traceback" when an error occurs. So, I'm happy to start using native pipes - but would not introduce magittr at this stage of the game.

mikemahoney218-usgs · 2024-11-22T22:24:58Z

R/constructNWISURL.R

-                              format = "rdb"
-           )
+           url <- httr2::req_url_query(baseURL,
+                              site_no = siteNumbers,.multi = "comma")


Suggested change

site_no = siteNumbers,.multi = "comma")

site_no = siteNumbers, .multi = "comma")

Sorry, this is such a small nit but I saw it 😆

mikemahoney218-usgs · 2024-11-22T22:40:11Z

R/findNLDI.R

-#' get_nldi(paste0(base, "comid/101"), type = "feature", use_sf = TRUE)
-#' get_nldi(url = paste0(base, "nwissite/USGS-11120000"), type = "feature", use_sf = TRUE)
-#' get_nldi(paste0(base, "nwissite/USGS-11120000"), type = "feature", use_sf = TRUE)
+#' dataRetrieval:::get_nldi(paste0(base, "comid/101"), type = "feature", use_sf = FALSE)


I think CRAN gets mad at ::: even if it's never actually run -- believe this gets caught by their automated checks. Maybe not an issue due to @noRd, but I'm not 100% sure.

If it is an issue, the workaround is to use getFromNamespace() to assign the non-exported object into the current environment, which defeats the static analysis checks. (If it's not an issue with CRAN, well, sorry for the distraction!)

mikemahoney218-usgs · 2024-11-22T22:45:26Z

R/getWebServiceData.R

@@ -132,7 +115,7 @@ check_non_200s <- function(returnedList){
 default_ua <- function() {


have I mentioned how much I like this, by the way? This approach to tagging user agents is great

mikemahoney218-usgs · 2024-11-22T22:52:12Z

R/whatWQPdata.R

+        sites <- values[["siteid"]]
+        sites <- paste0(sites, collapse = ";")
+        baseURL <- httr2::req_url_query(baseURL, 
+                                        siteid = sites)


Suggested change

sites <- values[["siteid"]]

sites <- paste0(sites, collapse = ";")

baseURL <- httr2::req_url_query(baseURL,

siteid = sites)

baseURL <- httr2::req_url_query(baseURL,

siteid = values[["siteid"]],

.multi = function(x) paste0(x, collapse = ";"))

I don't know if that's actually any more clear, but this could be handled inside of the httr2 function.

Nice! Didn't notice we could add custom functinos.

mikemahoney218-usgs · 2024-11-22T22:56:04Z

This looks awesome 😄 Left a few minor comments (apologies everyone for the email spam -- I didn't actually intend to review the full PR tonight 😆) but this is great.

dblodgett-usgs · 2024-11-22T23:17:16Z

🤗

… of Joe's comments.

… the site and extra articles)

mikemahoney218-usgs · 2024-11-25T17:44:06Z

Not sure if you already know this, but just in case -- that qpdf error is an error in Homebrew, and (according to Gabor) will fix itself when the next version of the GHA runner image is deployed (but no ETA for when that will be finished)

ldecicco-USGS · 2024-11-25T18:10:41Z

Unfortunately we're not allowed to call github actions by a version (like v2 which the r-lib folks really want us to use), so we need to occasionally include the actual commit hash in the GH action yaml file. It makes sense from a security standpoint - since the version tag is sliding - they could any anything in there and we wouldn't know.

At one point I had a script that would pull the latest commit hash from a version number, but I cannot find that anymore...if anyone on this list remembers how to do that, it would save a bit of fiddling when these things need to be updated.

ldecicco-USGS added 28 commits October 23, 2024 16:26

First steps in switching to httr2

030edf4

Getting there!

84cb3cc

udpates from main

ffbfaa3

Merge branch 'main' of https://github.com/DOI-USGS/dataRetrieval into httr2 # Conflicts: # R/constructNWISURL.R # man/constructNWISURL.Rd

update ua

a8f236e

start

ea8170a

Merge branch 'main' of https://code.usgs.gov/water/dataRetrieval into…

d829f26

… httr2

cleaning up WQP calls

6062607

style

af0b523

getting the lists correct

f1bbab0

More NWIS and WQP httr2 updates

993a2a4

a few httr2 updates

9fca4e2

Fixing some tests. Mostly arguments in the URLs are shuffled around.

313fdda

wrong argument

f09bb35

More tests to fix and update findNLDI

90cb26c

local file tests

746d32c

getting water use URLs to work

47e35a7

more test url updates

790f924

everything except NGWMN working

82318a9

NGWMN

9792060

Dropping problimatic example (data source keeps changing)

111feb5

Upstream pull

6bfc32a

Merge branch 'main' of https://github.com/DOI-USGS/dataRetrieval into httr2 # Conflicts: # NEWS # R/getWebServiceData.R # man/getWebServiceData.Rd

adding count=no to legacy

d289745

updating docker file

7ad1ed5

run check on develop

20fa9e7

Fix test due to my last minute addition of count=no to legacy WQP

8f93afe

change example

8ee4a7a

Another set of multi's

0b60de4

wqp_check_status example

6146ebb

ldecicco-USGS requested a review from jzemmels November 5, 2024 22:06

jzemmels approved these changes Nov 18, 2024

View reviewed changes

mikemahoney218-usgs reviewed Nov 22, 2024

View reviewed changes

ldecicco-USGS added 3 commits November 25, 2024 11:11

Trying new GH actions to reduce suggest list, and responding to a few…

722ed47

… of Joe's comments.

Shame, shame, shame! 😔

0b30821

Meant to include this so we can test if the GH actions work (building…

1704e18

… the site and extra articles)

More updates thanks to Joe's review

29ecfcb

Custom .multi for legacy

9ce4776

ldecicco-USGS merged commit 35a957f into DOI-USGS:develop Nov 25, 2024
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Initial switch to httr2 #738

Initial switch to httr2 #738

ldecicco-USGS commented Nov 5, 2024

ldecicco-USGS commented Nov 5, 2024

jzemmels left a comment

jzemmels Nov 15, 2024

jzemmels Nov 15, 2024

ldecicco-USGS Nov 25, 2024

jzemmels Nov 25, 2024

jzemmels Nov 15, 2024

jzemmels Nov 15, 2024

jzemmels Nov 18, 2024

jzemmels Nov 18, 2024

mikemahoney218-usgs Nov 22, 2024

ldecicco-USGS Nov 22, 2024

mikemahoney218-usgs Nov 22, 2024

ldecicco-USGS Nov 25, 2024

mikemahoney218-usgs Nov 22, 2024

mikemahoney218-usgs Nov 22, 2024

mikemahoney218-usgs Nov 22, 2024

mikemahoney218-usgs Nov 22, 2024

ldecicco-USGS Nov 25, 2024

mikemahoney218-usgs commented Nov 22, 2024

dblodgett-usgs commented Nov 22, 2024

mikemahoney218-usgs commented Nov 25, 2024 •

edited

Loading

ldecicco-USGS commented Nov 25, 2024

	#' timesereies <- importWaterML2(baseURL, asDateTime = TRUE, tz = "UTC")
	#' timeseries <- importWaterML2(httr2::request(baseURL), asDateTime = TRUE, tz = "UTC")

		@@ -39,14 +39,14 @@ Copyright: This software is in the public domain because it contains materials
		Depends:
		R (>= 3.5.0)

	site_no = siteNumbers,.multi = "comma")
	site_no = siteNumbers, .multi = "comma")

		@@ -132,7 +115,7 @@ check_non_200s <- function(returnedList){
		default_ua <- function() {

Initial switch to httr2 #738

Initial switch to httr2 #738

Conversation

ldecicco-USGS commented Nov 5, 2024

ldecicco-USGS commented Nov 5, 2024

jzemmels left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mikemahoney218-usgs commented Nov 22, 2024

dblodgett-usgs commented Nov 22, 2024

mikemahoney218-usgs commented Nov 25, 2024 • edited Loading

ldecicco-USGS commented Nov 25, 2024

mikemahoney218-usgs commented Nov 25, 2024 •

edited

Loading