Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Operation testing suite #235

Open
FL33TW00D opened this issue Jul 5, 2024 · 4 comments
Open

Operation testing suite #235

FL33TW00D opened this issue Jul 5, 2024 · 4 comments
Labels
help wanted Extra attention is needed
Milestone

Comments

@FL33TW00D
Copy link
Collaborator

FL33TW00D commented Jul 5, 2024

As more and more browsers ship WebGPU, there may be minor discrepancies between implementations.
This may cause us significant delays and issues if not addressed.

So, what we need is a test suite like no other. It must fuzz all functionality in all possible deployment settings.

Browser: Chrome, Safari, Firefox,
OS: Windows, Macos, Linux

This gives us 7 combinations we need to fuzz all functionality on.

We do not currently do operation tests in the browser because they rely on pytorch for ground truth - this must be resolved by using pre-generated ground truth data (or some other great idea).

This will be done in conjunction with our property based testing, which runs locally and is ground truthed against pytorch.

@FL33TW00D FL33TW00D added the help wanted Extra attention is needed label Jul 5, 2024
@FL33TW00D
Copy link
Collaborator Author

@philpax how would you get ground truth in the browser? any good ideas?

@sigma-andex
Copy link
Collaborator

Unpopular opinion: Have tests in Python, use https://github.com/microsoft/playwright-python to call the JS/WASM code and get results, compare to Pytorch

@philpax
Copy link
Contributor

philpax commented Jul 6, 2024

This gives us 7 combinations we need to fuzz all functionality on.

You may also need to consider AMD/NVIDIA/Intel graphics cards for Windows/Linux, x86 vs Apple Silicon for macOS, and mobile support. Yeah, this gets to be pretty painful pretty quickly 😭

@philpax how would you get ground truth in the browser? any good ideas?

Hmm... yeah, I think you'd want to capture ground truth data with PyTorch on the "host" and then check against that. It'll be pretty annoying because of the sheer amount of data, but you could generate that on the fly or just compare the outputs.

Unpopular opinion: Have tests in Python, use https://github.com/microsoft/playwright-python to call the JS/WASM code and get results, compare to Pytorch

This also sounds pretty reasonable to me. You could also do the same thing from Rust, but it might be easier to drive them from Python because you could use PyTorch directly. (I think you're already doing some kind of PyTorch orchestration from Rust for your existing tests, though?)

@FL33TW00D
Copy link
Collaborator Author

FL33TW00D commented Jul 10, 2024

Proposal

Proposing a new testing suite that will allow for operation tests to be run in the browser and ensure valid results across the following DOF:

  1. Operation
  2. OS
  3. GPU Vendor
  4. Tolerance
  5. DType

E.g Add, MacOS, Intel, 1e-3, Q8_0

Invoke: TestGen::generate_unary(op, tol, dt)
Result:

"Add": {
     "inputs": [{
             "value": [0.1, 0.2, 0.3],
             "dt": "Q8"
      }],
      "outputs": [{
              "value": [0.2, 0.3, 0.4],
              "dt": "Q8"
      }],
      "atol": 1e-3,
      "rtol": 1e-3,
}
#[cfg_attr(target_arch="wasm32", wasm_bindgen_test]
pub fn test_add() {
         let test_case: WebTest = serde::deserialize(include_bytes!("add.json"));
         ...
}

@FL33TW00D FL33TW00D added this to the 0.5.0 milestone Jul 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

3 participants