add pagination and base_url to JSON query_runner #6499

Ken-Michalak · 2023-10-04T18:50:51Z

What type of PR is this?

Description

Add the ability to paginate the JSON Query Runner and configure a base URL.
Support Google's pagination design pattern, with pagination: [next_page_token, page_token].
Support a next URL, which is sometimes used in HATEOAS, with pagination: _links.next.href.

How is this tested?

Unit tests (pytest, jest)
E2E Tests (Cypress)
Manually
N/A

Related Tickets & Documents

Original PR: #5067

Mobile & Desktop Screenshots/Recordings (if there are UI changes)

Ken-Michalak · 2023-10-04T19:11:29Z

@justinclift Here's the new pr.

I'm not sure if this is the best way to configure it, especially if someone wants to support other patterns later, like page numbers or offsets without anything in response for what to pass next. I was wondering if there were other ideas there or if there's something like that on another data source.

justinclift · 2023-10-05T07:12:31Z

@guidopetri @eradman Any interest in reviewing this? 😄

justinclift · 2023-10-05T07:13:16Z

Ahhh, looks like this has a formatting issue that needs fixing:

Run ruff check .
redash/query_runner/json_ds.py:1:1: I001 [*] Import block is un-sorted or un-formatted

guidopetri · 2023-10-06T00:22:23Z

Could we also add a test to make it clearer what is going on, and what the expected behavior is?

Ken-Michalak · 2023-10-09T19:21:24Z

@guidopetri I restructured the configuration, which hopefully makes a bit more sense.

For a config like this:

pagination:
  type: url
  #path: _links.next.href

As long as the API response has something in _links.next.href, it will follow it, do another request, and include those results.

For the other pattern:

pagination:
    type: token
    #fields: [next_page_token, page_token]

It does the same kind of thing, where as long as next_page_token isn't "" or null, it continues getting the next page of results by putting that token in the page_token parameter.

I looked into adding a test, but it doesn't look like there are any tests for the json data source at all yet, and this would be pretty complicated to mock. I was just testing it manually by querying a Chargebee API and a Vimeo VHX API.

guidopetri

I very much don't feel comfortable merging this in without a test. To be honest I'm not sold on the idea of implementing pagination here in the first place. As mentioned in the comments below, I'm not sure there's a standard for pagination, and if we were to implement every pagination ruleset out there we'd have a very large codebase 😅 but if we do want to merge this in I would very kindly request that we somehow add tests, even if we have to write some from scratch for the JSON query runner to start out with.

guidopetri · 2023-10-10T00:08:15Z

redash/query_runner/json_ds.py

+            has_more = False
+
+            result = _normalize_json(response, result_path)
+            if result:


Having this many nested if statements is kind of a code smell. Maybe we should split this up into multiple functions?

We could, but there are kind of a lot of variables to pass around to do so. I personally would just pass query around instead of breaking it up on line 168 and then passing that around everywhere. I was just trying to change as little as possible. Should I do that switch?

In my opinion, yes - definitely worth it if it makes the code easier to follow.

guidopetri · 2023-10-10T00:10:03Z

redash/query_runner/json_ds.py

+            if result:
+                results.extend(result)
+
+                if pagination["type"] == "url":


I've never seen either of these two pagination patterns. I don't think there's a "standard" either, is there? I imagine every API has its own set of pagination rules that it follows. If that's the case, I'm not sure it makes sense to write in pagination rules, and rather we should emphasize to the end users that querying an API for JSONs directly like this is meant for testing and not large-scale analysis.

Especially since if too much data is returned, it's possible that the worker goes OOM. 😅

Some variant of these 2 seem to be the most common, because they don't require specifics of implementation, like having to add to the page number or offset yourself. (The type could be named something different if you have a better idea though.) That's why Google outlines the next_page_token/page_token as a best practice. It's just that sometimes the names are different, like Chargebee uses an offset parameter, but rather than having to calculate it, it returns the next offset in the response. Google Analytics, AWS, and K8s also follow that pattern.

Shopify, SalesForce, Facebook's Graph API, and Trino all pass a full URL for the next page in some field of the response, which would be covered by the url type. HATEOAS is a standard and does define that the next page would be a link somewhere in the response. Although, at some point, it might be good to use a library instead of _apply_path_search to handle more complex cases.

Other types could be added later for pageNumber or something, but I don't really have a need for them right now, and that is where it could be pretty difficult to cover different APIs.

I was wondering about the memory thing, but it doesn't look like there's any other way that large result sets are handled elsewhere. The Trino data source does a fetchall and then processes all of the results at once. This at least does the _normalize_json per page, which could trim things down.

Fair points; thanks for elaborating.

redash/query_runner/json_ds.py

codecov · 2023-10-10T15:48:11Z

Codecov Report

Merging #6499 (6902fd4) into master (69d1e03) will increase coverage by 0.35%.
Report is 3 commits behind head on master.
The diff coverage is 66.66%.

@@            Coverage Diff             @@
##           master    #6499      +/-   ##
==========================================
+ Coverage   61.53%   61.88%   +0.35%     
==========================================
  Files         158      158              
  Lines       12898    12965      +67     
  Branches     1756     1770      +14     
==========================================
+ Hits         7937     8024      +87     
+ Misses       4704     4666      -38     
- Partials      257      275      +18

Files	Coverage Δ
redash/query_runner/json_ds.py	`64.20% <66.66%> (+40.35%)`	⬆️

Ken-Michalak · 2023-10-10T15:54:37Z

I very much don't feel comfortable merging this in without a test. To be honest I'm not sold on the idea of implementing pagination here in the first place. As mentioned in the comments below, I'm not sure there's a standard for pagination, and if we were to implement every pagination ruleset out there we'd have a very large codebase 😅 but if we do want to merge this in I would very kindly request that we somehow add tests, even if we have to write some from scratch for the JSON query runner to start out with.

I can work on adding some tests.

Ken-Michalak · 2023-10-13T16:27:49Z

Odd, I didn't see any linting errors, but yeah, black still changed some things.

guidopetri · 2023-10-17T02:00:40Z

tests/query_runner/test_json_ds.py

+    return data, error
+
+
+class TestJSON(TestCase):


Ah, sorry, I should have specified. We've been aiming for pytest tests instead of unittest; if it's not too much trouble, would you mind converting to those? I won't hold it against you if you're not feeling up to it though. I appreciate the work you've put into this PR and already recognize you've gone above and beyond :)

I just used the other tests as an example. It could always be refactored later. It seems like some of that should be added to the contributing guide, like the testing, ruff, and rebasing process. Like are you looking for a single commit per pr, or just going for a linear history?

That's a great point. I don't think we have a good handle on what our contribution guidelines are; I'll try and add some documentation for that to the wiki soon.

guidopetri

Approving, but we'll need the PR to be updated with the latest master to merge. I'll enable automerge whenever you push a merge commit updating the branch, so it should merge automatically once the tests pass.

Thanks again for all your work @Ken-Michalak !

justinclift · 2023-10-18T05:53:48Z

@Ken-Michalak We have an option turned on for this repo, which requires PR's to be up-to-date with the master branch before they can be merged.

Normally PRs have a button to "Update to the master branch" (or similar wording), that maintainers can click to automate that.

It seems to be missing from this PR though. My guess is the PR was created without the "Allow maintainers to update this PR?" setting enabled.

So... I think that means you'll need to update this PR yourself. That might let it all just get nicely merged automatically after the tests pass. 😄

guidopetri · 2023-10-18T13:09:21Z

I bypassed it - we didn't touch the poetry files so it should be ok.

Thanks for your contribution @Ken-Michalak !

Ken-Michalak · 2023-10-18T14:58:47Z

@Ken-Michalak We have an option turned on for this repo, which requires PR's to be uo-to-date with the master branch before they can be merged.

Normally PRs have a button to "Update to the master branch" (or similar wording), that maintainers can click to automate that.

It seems to be missing from this PR though. My guess is the PR was created without the "Allow maintainers to update this PR?" setting enabled.

So... I think that means you'll need to update this PR yourself. That might let it all just get nicely merged automatically after the tests pass. 😄

@justinclift, I actually did that in that last force-push yesterday, but it looks like before @guidopetri approved it again, there were a few other commits added. (Bump @babel/traverse from 7.22.8 to 7.23.2) 🤷🏻 Thanks though.

justinclift · 2023-10-18T15:22:58Z

No worries, it all worked out. Thanks heaps for getting this done @Ken-Michalak, it seems like this will be useful for people. 😄

justinclift added python Pull requests that update Python code Team Review Meets PR criteria, ready for full review labels Oct 5, 2023

Ken-Michalak force-pushed the add-json-query-runner-pagination branch from 90bed4e to 14b10ba Compare October 9, 2023 18:55

guidopetri requested changes Oct 10, 2023

View reviewed changes

Ken-Michalak force-pushed the add-json-query-runner-pagination branch from 14b10ba to 5cfb03d Compare October 10, 2023 20:44

Ken-Michalak requested a review from guidopetri October 10, 2023 21:14

Ken-Michalak force-pushed the add-json-query-runner-pagination branch from 5cfb03d to 2fc5633 Compare October 13, 2023 16:25

guidopetri reviewed Oct 17, 2023

View reviewed changes

guidopetri approved these changes Oct 17, 2023

View reviewed changes

add pagination and base_url to JSON query_runner

6902fd4

Ken-Michalak force-pushed the add-json-query-runner-pagination branch from 2fc5633 to 6902fd4 Compare October 17, 2023 14:35

guidopetri approved these changes Oct 18, 2023

View reviewed changes

guidopetri enabled auto-merge (squash) October 18, 2023 02:55

guidopetri disabled auto-merge October 18, 2023 13:08

guidopetri merged commit 7b03e60 into getredash:master Oct 18, 2023
13 checks passed

Ken-Michalak deleted the add-json-query-runner-pagination branch October 18, 2023 14:59

harveyrendell pushed a commit to pushpay/redash that referenced this pull request Jan 8, 2025

add pagination and base_url to JSON query_runner (getredash#6499)

83ae613

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add pagination and base_url to JSON query_runner #6499

add pagination and base_url to JSON query_runner #6499

Ken-Michalak commented Oct 4, 2023 •

edited

Loading

Ken-Michalak commented Oct 4, 2023

justinclift commented Oct 5, 2023

justinclift commented Oct 5, 2023

guidopetri commented Oct 6, 2023

Ken-Michalak commented Oct 9, 2023

guidopetri left a comment

guidopetri Oct 10, 2023

Ken-Michalak Oct 10, 2023

guidopetri Oct 17, 2023

guidopetri Oct 10, 2023

Ken-Michalak Oct 10, 2023

guidopetri Oct 17, 2023

codecov bot commented Oct 10, 2023 •

edited

Loading

Ken-Michalak commented Oct 10, 2023

Ken-Michalak commented Oct 13, 2023

guidopetri Oct 17, 2023

Ken-Michalak Oct 17, 2023

guidopetri Oct 18, 2023

guidopetri left a comment •

edited

Loading

justinclift commented Oct 18, 2023 •

edited

Loading

guidopetri commented Oct 18, 2023

Ken-Michalak commented Oct 18, 2023

justinclift commented Oct 18, 2023

add pagination and base_url to JSON query_runner #6499

add pagination and base_url to JSON query_runner #6499

Conversation

Ken-Michalak commented Oct 4, 2023 • edited Loading

What type of PR is this?

Description

How is this tested?

Related Tickets & Documents

Mobile & Desktop Screenshots/Recordings (if there are UI changes)

Ken-Michalak commented Oct 4, 2023

justinclift commented Oct 5, 2023

justinclift commented Oct 5, 2023

guidopetri commented Oct 6, 2023

Ken-Michalak commented Oct 9, 2023

guidopetri left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Oct 10, 2023 • edited Loading

Codecov Report

Ken-Michalak commented Oct 10, 2023

Ken-Michalak commented Oct 13, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

guidopetri left a comment • edited Loading

Choose a reason for hiding this comment

justinclift commented Oct 18, 2023 • edited Loading

guidopetri commented Oct 18, 2023

Ken-Michalak commented Oct 18, 2023

justinclift commented Oct 18, 2023

Ken-Michalak commented Oct 4, 2023 •

edited

Loading

codecov bot commented Oct 10, 2023 •

edited

Loading

guidopetri left a comment •

edited

Loading

justinclift commented Oct 18, 2023 •

edited

Loading