RFC: Drop Verification from Benchmarking #3

msmith-techempower · 2020-07-29T16:40:33Z

Summary

Remove the verification step during benchmarking

Motivation

Verification is an important step in determining whether a test implementation is valid, but can be very time-consuming in practice. The implementation of the new Verifier aims to make understanding verification implementations easier, as well as adding new verifications easier. With the assumption that verifications will be more easily expanded and new verifications added (see new verifications for reference), the time it takes to run a verification is expected to increase over time.

At the time of this writing, the update verification takes 31 seconds. json take 4 as does plaintext. There are 650 test implementations, each of which will end up verifying one or more of these test types. In the best case (incorrect) scenario, 2,600 seconds (43 minutes) are spent verifying. In the case where 3/5 take 30 seconds, it looks more like 50,960 seconds (14 hours). A continuous benchmark run can be shortened by several hours via the following:

The benchmark process will not run verification
Assume that if a test implementation is not tagged "broken", that it has passed verification

History

Currently (and also in the legacy implementation), running a benchmark of a given test implementation incurs the cost of verification. This is done to ensure that time is not spent on running a benchmark against a test which will not respond correctly to the end-point being benchmarked (e.g. if fortune returns a 500, instead of a 200, it should not be measured).

The legacy implementation has this rule imposed because verification came as an afterthought to the benchmarking process. Originally, the legacy implementation did benchmark test implementation which returned a 500 response, for example. Eventually, the verification step was added to ensure that tests were implemented correctly, and patches were made to tests to try and get them to pass verification retroactively. Verification has been the standard for several years now, and it seems like we are past the point where test implementations are merged which do not pass verification.

Drawbacks

A clever malicious contributor could open a pull request with a sophisticated black box framework implementation (as a linked library, rather than source code) that passes verification to get merged in, but returns empty 200s for all the tests (or a similar attack)
Unreliable failures, such as remote dependencies not being available at the time, would result in benchmarking incorrect implementations. May be addressed by RFC: Publish Tagged Test Implementations for Benchmarking #4

Supplemental Considerations

Implement a "light" verification step tailored to running a benchmark, which only checks that the service is available and returning a 200 response. This would alleviate, somewhat, the unreliable failures drawback mentioned above.

Alternatives

Leave verification as a first-step to running a benchmark

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: Drop Verification from Benchmarking #3

RFC: Drop Verification from Benchmarking #3

msmith-techempower commented Jul 29, 2020 •

edited

Loading

RFC: Drop Verification from Benchmarking #3

RFC: Drop Verification from Benchmarking #3

Comments

msmith-techempower commented Jul 29, 2020 • edited Loading

Summary

Motivation

History

Drawbacks

Supplemental Considerations

Alternatives

msmith-techempower commented Jul 29, 2020 •

edited

Loading