[torch.compile] Add torch inductor pass for fusing silu_and_mul with subsequent scaled_fp8_quant operations #10867

SageMoore · 2024-12-03T16:49:39Z

Credit to @LucasWilkinson for the kernel.

This pass currently only supports static per-tensor quantization. Other quantization schemes will be included in a subsequent PRs.

github-actions · 2024-12-03T16:49:52Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

init

3190440

mergify bot added the ci/build label Dec 3, 2024

SageMoore added 4 commits December 3, 2024 16:53

remove backend format changes

4f8c864

format

127f09d

move activation_quant_kernels to the quantization dir

a337a64

added replacement unit test

23e6fcb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[torch.compile] Add torch inductor pass for fusing silu_and_mul with subsequent scaled_fp8_quant operations #10867

[torch.compile] Add torch inductor pass for fusing silu_and_mul with subsequent scaled_fp8_quant operations #10867

SageMoore commented Dec 3, 2024 •

edited by github-actions bot

Loading

github-actions bot commented Dec 3, 2024

[torch.compile] Add torch inductor pass for fusing silu_and_mul with subsequent scaled_fp8_quant operations #10867

Are you sure you want to change the base?

[torch.compile] Add torch inductor pass for fusing silu_and_mul with subsequent scaled_fp8_quant operations #10867

Conversation

SageMoore commented Dec 3, 2024 • edited by github-actions bot Loading

github-actions bot commented Dec 3, 2024

SageMoore commented Dec 3, 2024 •

edited by github-actions bot

Loading