Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor to migrate package to {httr2} (#13) and expand implementation of metadata API (#12) #14

Merged
merged 173 commits into from
Sep 16, 2023

Conversation

elipousson
Copy link
Collaborator

@elipousson elipousson commented Apr 23, 2023

This is the promised pull request to migrate the package to {httr2} for #13 but also includes some expanded implementation of the metadata API for #12. In retrospect, I should have broken this up into multiple smaller pull requests and I totally can understand if it take a while to work through the review.

Here are the highlights of the major changes:

I've removed all calls to {httr} and a series of internal helpers (prefixed with "req" with some helpers) use httr2 to build and perform the API calls. Some of these helpers support a single user-facing function (e.g. req_delete_record()), some supporting multiple functions (e.g. req_perform_offset()), and some that are used in every request (e.g. req_auth_airtable()). There are three of these functions worth noting in particular:

  • airtable_request() creates an initial request object with httr2::request() (using the default values for api_url and api_version).
  • req_airtable_auth() sets up throttling, error handling, and sets a user agent for the package.
  • req_airtable_query() handles building the query (with httr2::req_template() or httr2::req_url_query()) and optionally setting the method with httr2::req_method() or adding a body with httr2::req_body_json(). This function calls airtable_request() if a request isn't provided and calls req_airtable_auth() at the end so it is used by many other functions.

I tried to be consistent in the naming and structure for these functions but please call out anything that seems confusing.

Any existing code built on top of rairtable functions should still work. However, for most user-facing functions, the airtable class object is now an optional input parameter. Instead, users can provide an airtable class object, a base id and table id or name, or an Airtable URL. If a user provides a URL, it is passed to the new parse_airtable_url() that breaks it apart into a base ID, table ID and view ID.

There is no longer an option to turn off the batch requests and the user-facing functions switch between the single and multiple record API options without informing the user. The progress bars are created by cli::cli_progress_along() so they are always on but can be turned off (or turned on more quickly) by modifying the cli package options. There is good documentation for this function here: https://cli.r-lib.org/articles/progress.html#progress-bars-for-mapping-functions-cli_progress_along

There is new vignette with documentation for all of the configuration options which is how I handled the default values that I didn't seem essential to expose to users.

The one potential breaking change is that the airtable class object is now a list created using vctrs::new_vctr(). I'm not set on this if you had a strong preference for your existing implementation but keeping everything in a list made it easier to figure out what and how to access the different elements. This also will ease the implementation of a format method for the airtable class which should allow some further simplification of the print and str methods.

The parallel parameter isn't implemented but I thought it might be helpful to do a speed comparison between the old version and this new version to see what difference it makes.

Lastly, I started on an implementation of a create_table() function that might make sense to include in this same pull request but isn't essential. Let me know if you have any immediate questions but I'll come back and keep working on the outstanding items later today (or later this week).

- refactor: major changes to migrate from from httr to httr2
- test: add tests with testthat + add withr with Suggests
- docs: switch to markdown formatted documentation + add package documentation
- refactor: incorporate standalone-types-check script from rlang
- refactor: consolidate airtable functions in airtable.R
- refactor: rename req_airtable_url to airtable_request
- refactor: add lifecycle to Imports
- docs: revise airtable documentation
- feat: remove str method for airtable objects
- docs: expand airtable_request documentation
- feat: add support for Airtable URLs to airtable_request
- refactor: add data parameter to req_airtable_query
- refactor: move req_airtable_records
- refactor: rename utils_httr2.R to airtable_request.R
- docs: expand function documentation
- refactor: restore airtable_id_col parameter
- refactor: make sure batch_size is fully parameterized internally based on "rairtable.batch_size" option
- refactor: rename get_ids helper function and move to  get_records.R w/ get_record_id_col + select_id_col + id_col_names
- docs: add environment variables definitions
- docs: add Eli Pousson to contributors
- feat: add list_airtable_bases
- feat: add as_tibble parameter to airtable_base
- refactor: rename req_airtable_query to req_query_airtable for consistency in httr2 wrapper function names
- feat: add read_airtable_records and read_airtable_record to read_airtable
- refactor: modify safety_check, remove adorn_text + stop_quietly, + add cli_ask + list_rbind + string_extract utility functions
@elipousson
Copy link
Collaborator Author

Just a quick heads up that I've implemented a few more features based on the potential of using a table model for validation including:

  • adding the new copy_table_config() and get_field_config() functions
  • field and sort parameter validation for read_records() and list_records()
  • reordering columns to match the table model for read_records(), list_records(), and get_record()

I also added a .name_repair parameter to handle the possibility of a name conflict between the airtable ID column name and the existing columns within an Airtable base.

I also replaced the return_json parameter with the more intuitive (IMHO) return_data or return_records parameters and exposed the simplifyVector parameter (and re-working a few things so they'll handle either data frame or list inputs).

I've expanded test coverage and improved input checks to make sure this additional capacity doesn't create any difficult to trouble-shoot errors for users but also understand that most folks may not need this level of complexity. I figure it would be helpful to add a vignette showing how to use the schema and model parameters (and why someone may want to).

I hope it isn't a pain that this pull request has been a moving target! I figured if you wouldn't have time to review until later this week that I'd push to finish up a few more things that I think should be useful for my workflow (and others). I've been learning a lot as I'm building this out and really looking forward to seeing it up on CRAN!

@elipousson
Copy link
Collaborator Author

And I just added a create_base() function that can use the list of tables from a base schema to create a new table array (effectively copying an existing base with the exception of the record link columns and a few others types that are difficult to copy over). A new metadata API vignette should be finished by the end of the week!

@matthewjrogers
Copy link
Owner

Thanks for all of this, this is a tremendous amount of work. I will dig in over the next couple of days.

My suggestion, for what it's worth, is that we may want to pin the current state as the next release (pending any bugs/documentation changes) and switch focus to preparing a CRAN release. We can always think about another minor version change in the next 6 months or so.

@elipousson
Copy link
Collaborator Author

@matthewjrogers That makes a lot of sense. I'll put a pause on any new features and look out for feedback! Your prior comments were helpful and well-organized so feel free to take that same approach or edit the pull request directly.

There are a couple things I'd suggest looking out for (based on the issues I've caught so far):

  • Unexpected or undocumented behavior using the airtable_request() and get ID functions (e.g. is it clear that passing a field URL to get_record() can set the record ID but won't override a record ID if it is supplied?)
  • Consistent handling of table names (there are some places a table name works and some places the functions only handle a table ID and I'm not certain which are which)
  • Predictable/consistent type outputs (once I started adding in the option to pass the schema and model as a parameter, I realized it would be handy to be able to get API responses as lists rather than always coercing to a data frame - but now the documentation is potentially out of date and I want to make sure it is clear what functions return what types of data). For example, I think get_table_model() may need a stricter limit on the structure of the permitted output

There are also a few places where I think {vctrs} may be a useful substitute for the existing helper functions – especially since it has functions that can handle data frames and lists agnostically. But, that can probably wait for a minor version update so I wouldn't suggest it as a priority.

@matthewjrogers
Copy link
Owner

@elipousson -- wanted to give you a heads up that, unfortunately, it might be a bit before I get back to this. My hometown in Vermont got hit pretty hard in the recent flooding -- going to try and carve out the time for this, but my timeline for review has dilated. Wanted to make sure you know this is not forgotten

@elipousson
Copy link
Collaborator Author

Really sorry to hear that you and your neighbors are so hard hit. I appreciate the heads up and hope you and your town get the support you need to rebuild and recover. Open source will wait!

@matthewjrogers
Copy link
Owner

@elipousson finally found some time for this, and (anticlimactic, perhaps) I think this looks excellent. I am going to merge this pull request and start the checks needed to get this to CRAN. You may see more from me over the course of the next week or so if anything needs to be done to satisfy CRAN requirements related to your code. I am optimistic that this will go smoothly. I'll leave the PR open until we get that buttoned up.

Thank you, once again, for your extensive and thoughtful work on this.

@matthewjrogers matthewjrogers merged commit 0604894 into matthewjrogers:dev Sep 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants