Should the tap tests mocks github's API responses? #14

laurentS · 2021-09-12T16:51:35Z

While developing I've run a few times into github's api rate limits (with my auth_token set!), which gets pretty annoying, as the tap does not work at all then.
This might become an issue for CI as well when the number of streams increases (the limit without an auth token is 60 requests/hour per IP).
So I am wondering if we should setup some sort of mocking of API calls, for instance using pytest's monkeypatching fixture, or some other mechanism.

Pros:

faster and more reliable test suite (so probably more tests written)
better development experience (running tests would be almost instant)

Cons:

it takes a little time to build the mock responses
if the actual API changes its format, the mocks need to be updated. Tools like VCR.py could be helpful with this.
breaking changes in the actual API would not break the tap's tests, but something downstream would probably fail. Unless we run a periodic CI job to compare mocks to actual API output.

aaronsteers · 2021-09-14T19:11:15Z

@laurentS - I love this idea and I'm glad you referenced the VCR.py library. Do you mind adding some thoughts here: https://gitlab.com/meltano/sdk/-/issues/30#note_677520346

Yes - I think this kind of a mock capability would be fantastic.
If we can find a way to build the capability generically into the SDK, all the better. (Even if that is done in a future phase 2.)

aaronsteers · 2021-09-14T19:13:55Z

While developing I've run a few times into github's api rate limits (with my auth_token set!), which gets pretty annoying, as the tap does not work at all then.

@laurentS - Less exciting of a solution, but as a stop-gap, we could also limit the CI pipeline to only run full tests on python 3.9. Currently we're running full tests on all 4 supported python versions. We could set the other three to run partial tests only. Again, not ideal.

laurentS · 2021-09-15T08:59:43Z

While developing I've run a few times into github's api rate limits (with my auth_token set!), which gets pretty annoying, as the tap does not work at all then.

@laurentS - Less exciting of a solution, but as a stop-gap, we could also limit the CI pipeline to only run full tests on python 3.9. Currently we're running full tests on all 4 supported python versions. We could set the other three to run partial tests only. Again, not ideal.

True. I think we'll hit the 60 requests/hour randomly anyway but it's worth trying. As another option, we could fetch a github auth token from an env var, and setup a dedicated github account and use its token for ci, which would raise the limit to 5000/hour. Not ideal either, particularly when doing local dev.

laurentS · 2021-09-15T09:14:12Z

I'm thinking the mock+recording logic would really be helpful combined with a set of helper functions designed as a target-test-tap or something, so that all taps can simply be run like they're meant to. And that testing target class could be instantiated with expectations on its input, like "expect 6 records", " expect 2 streams with names aaa and bbb", "all records should have a name field", etc... With mocks, the input is known, so it would be easy to have strong assertions on the receiving end.

Someone wrote on gitlab that using the sdk guarantees valid singer output, but I feel it's only true if there is no bug in the sdk, and I don't add my own bugs as the tap developer (I did generate some invalid out, just forget a print("hello") in the tap). So I would argue that an end to end test would be very useful.

I'm AFK for a couple of weeks, but happy to give this a shot when I'm back if nobody has started before.

ericboucher · 2021-10-26T22:22:55Z

Quick update, @aaronsteers has added the use of a GITHUB_TOKEN for CI tests.

But we are already hitting rate limiting on some PRs. As a temporary solution, we could remove Python3.6 tests. See #30

Borrowing from your ideas above, a more long-term solution, potentially at the SDH level, would be to have full tests running on the latest version of Python, 3.9 here and record all the API calls. Then run subsequent tests using recorded API responses. What do you think?

cc @laurentS

ericboucher · 2021-11-12T11:11:08Z

If we are improving the tests, I think it would also be worth separating linters into their own workflow, they probably don't need to run for all python versions? Plus we would be able to see at a glance if the tests failed because of the linter or actual tests.

aaronsteers · 2021-11-12T18:57:32Z

Agreed. I always prefer to linting to be a separate job from pytest so that we can see distinct pass/fail on the style vs execution. I also agree linting should only be run on a primary python version, since linters can occasionally produce mutually-exclusive success conditions when run on different python versions.

This is the direction we're going for the new cookiecutter template: https://gitlab.com/meltano/sdk/-/merge_requests/202

We noted in that MR that we might also leverage tox for the lint operation at least. Personally I still like the explicit pytest execution in the CI job, but since there are many lint operations to run, a single tox lint can be a good local pattern as well.

Ry-DS · 2022-03-21T08:21:50Z

I feel this issue could be closed now that #60 is merged. Or we can rename this issue to something related to improving the CI flow with linting. Thoughts?

laurentS · 2022-03-21T14:44:24Z

Or for consistency across taps, we could adapt the CI script here to match the meltano template from https://github.com/MeltanoLabs/tap-gitlab/blob/main/.github/workflows/ci_workflow.yml

I'm not sure why tox is used on top of poetry, it looks like it creates its own venv on top of poetry's venv, so it seems slower than need be, but that's another question :)

laurentS mentioned this issue Nov 10, 2021

Add requests-cache implementation in CI to eliminate rate limit failures #43

Closed

laurentS mentioned this issue Dec 19, 2021

Cache api requests in tests #60

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Should the tap tests mocks github's API responses? #14

Should the tap tests mocks github's API responses? #14

laurentS commented Sep 12, 2021

aaronsteers commented Sep 14, 2021

aaronsteers commented Sep 14, 2021

laurentS commented Sep 15, 2021

laurentS commented Sep 15, 2021

ericboucher commented Oct 26, 2021 •

edited

Loading

ericboucher commented Nov 12, 2021

aaronsteers commented Nov 12, 2021 •

edited

Loading

Ry-DS commented Mar 21, 2022 •

edited

Loading

laurentS commented Mar 21, 2022

Should the tap tests mocks github's API responses? #14

Should the tap tests mocks github's API responses? #14

Comments

laurentS commented Sep 12, 2021

aaronsteers commented Sep 14, 2021

aaronsteers commented Sep 14, 2021

laurentS commented Sep 15, 2021

laurentS commented Sep 15, 2021

ericboucher commented Oct 26, 2021 • edited Loading

ericboucher commented Nov 12, 2021

aaronsteers commented Nov 12, 2021 • edited Loading

Ry-DS commented Mar 21, 2022 • edited Loading

laurentS commented Mar 21, 2022

ericboucher commented Oct 26, 2021 •

edited

Loading

aaronsteers commented Nov 12, 2021 •

edited

Loading

Ry-DS commented Mar 21, 2022 •

edited

Loading