Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Drop Verification from Benchmarking #3

Open
msmith-techempower opened this issue Jul 29, 2020 · 0 comments
Open

RFC: Drop Verification from Benchmarking #3

msmith-techempower opened this issue Jul 29, 2020 · 0 comments

Comments

@msmith-techempower
Copy link
Member

msmith-techempower commented Jul 29, 2020

Summary

  • Remove the verification step during benchmarking

Motivation

Verification is an important step in determining whether a test implementation is valid, but can be very time-consuming in practice. The implementation of the new Verifier aims to make understanding verification implementations easier, as well as adding new verifications easier. With the assumption that verifications will be more easily expanded and new verifications added (see new verifications for reference), the time it takes to run a verification is expected to increase over time.

At the time of this writing, the update verification takes 31 seconds. json take 4 as does plaintext. There are 650 test implementations, each of which will end up verifying one or more of these test types. In the best case (incorrect) scenario, 2,600 seconds (43 minutes) are spent verifying. In the case where 3/5 take 30 seconds, it looks more like 50,960 seconds (14 hours). A continuous benchmark run can be shortened by several hours via the following:

  1. The benchmark process will not run verification
  2. Assume that if a test implementation is not tagged "broken", that it has passed verification

History

Currently (and also in the legacy implementation), running a benchmark of a given test implementation incurs the cost of verification. This is done to ensure that time is not spent on running a benchmark against a test which will not respond correctly to the end-point being benchmarked (e.g. if fortune returns a 500, instead of a 200, it should not be measured).

The legacy implementation has this rule imposed because verification came as an afterthought to the benchmarking process. Originally, the legacy implementation did benchmark test implementation which returned a 500 response, for example. Eventually, the verification step was added to ensure that tests were implemented correctly, and patches were made to tests to try and get them to pass verification retroactively. Verification has been the standard for several years now, and it seems like we are past the point where test implementations are merged which do not pass verification.

Drawbacks

  • A clever malicious contributor could open a pull request with a sophisticated black box framework implementation (as a linked library, rather than source code) that passes verification to get merged in, but returns empty 200s for all the tests (or a similar attack)
  • Unreliable failures, such as remote dependencies not being available at the time, would result in benchmarking incorrect implementations. May be addressed by RFC: Publish Tagged Test Implementations for Benchmarking #4

Supplemental Considerations

  • Implement a "light" verification step tailored to running a benchmark, which only checks that the service is available and returning a 200 response. This would alleviate, somewhat, the unreliable failures drawback mentioned above.

Alternatives

  • Leave verification as a first-step to running a benchmark
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant