Operation testing suite #235

FL33TW00D · 2024-07-05T18:55:35Z

As more and more browsers ship WebGPU, there may be minor discrepancies between implementations.
This may cause us significant delays and issues if not addressed.

So, what we need is a test suite like no other. It must fuzz all functionality in all possible deployment settings.

Browser: Chrome, Safari, Firefox,
OS: Windows, Macos, Linux

This gives us 7 combinations we need to fuzz all functionality on.

We do not currently do operation tests in the browser because they rely on pytorch for ground truth - this must be resolved by using pre-generated ground truth data (or some other great idea).

This will be done in conjunction with our property based testing, which runs locally and is ground truthed against pytorch.

The text was updated successfully, but these errors were encountered:

FL33TW00D · 2024-07-05T21:23:38Z

@philpax how would you get ground truth in the browser? any good ideas?

sigma-andex · 2024-07-06T11:40:09Z

Unpopular opinion: Have tests in Python, use https://github.com/microsoft/playwright-python to call the JS/WASM code and get results, compare to Pytorch

philpax · 2024-07-06T21:24:14Z

This gives us 7 combinations we need to fuzz all functionality on.

You may also need to consider AMD/NVIDIA/Intel graphics cards for Windows/Linux, x86 vs Apple Silicon for macOS, and mobile support. Yeah, this gets to be pretty painful pretty quickly 😭

@philpax how would you get ground truth in the browser? any good ideas?

Hmm... yeah, I think you'd want to capture ground truth data with PyTorch on the "host" and then check against that. It'll be pretty annoying because of the sheer amount of data, but you could generate that on the fly or just compare the outputs.

Unpopular opinion: Have tests in Python, use https://github.com/microsoft/playwright-python to call the JS/WASM code and get results, compare to Pytorch

This also sounds pretty reasonable to me. You could also do the same thing from Rust, but it might be easier to drive them from Python because you could use PyTorch directly. (I think you're already doing some kind of PyTorch orchestration from Rust for your existing tests, though?)

FL33TW00D · 2024-07-10T14:46:51Z

Proposal

Proposing a new testing suite that will allow for operation tests to be run in the browser and ensure valid results across the following DOF:

Operation
OS
GPU Vendor
Tolerance
DType

E.g Add, MacOS, Intel, 1e-3, Q8_0

Invoke: TestGen::generate_unary(op, tol, dt)
Result:

"Add": {
     "inputs": [{
             "value": [0.1, 0.2, 0.3],
             "dt": "Q8"
      }],
      "outputs": [{
              "value": [0.2, 0.3, 0.4],
              "dt": "Q8"
      }],
      "atol": 1e-3,
      "rtol": 1e-3,
}

#[cfg_attr(target_arch="wasm32", wasm_bindgen_test]
pub fn test_add() {
         let test_case: WebTest = serde::deserialize(include_bytes!("add.json"));
         ...
}

FL33TW00D added the help wanted Extra attention is needed label Jul 5, 2024

FL33TW00D added this to the 0.5.0 milestone Jul 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Operation testing suite #235

Operation testing suite #235

FL33TW00D commented Jul 5, 2024 •

edited

Loading

FL33TW00D commented Jul 5, 2024

sigma-andex commented Jul 6, 2024

philpax commented Jul 6, 2024

FL33TW00D commented Jul 10, 2024 •

edited

Loading

Operation testing suite #235

Operation testing suite #235

Comments

FL33TW00D commented Jul 5, 2024 • edited Loading

FL33TW00D commented Jul 5, 2024

sigma-andex commented Jul 6, 2024

philpax commented Jul 6, 2024

FL33TW00D commented Jul 10, 2024 • edited Loading

Proposal

FL33TW00D commented Jul 5, 2024 •

edited

Loading

FL33TW00D commented Jul 10, 2024 •

edited

Loading