-
Notifications
You must be signed in to change notification settings - Fork 10
Conversation
…anch safe_expose_semi_structured_sparse_tensor
Semi-structured 2:4 sparsity via SparseSemiStructuredTensor
…re (eager_force=False)
…size by running multiple parallel problems of size 64. (2) Refactor the workspace to be dynamic per layer
…d issues with tensor parallel runs)
cleanup to undo autoformatting
cleanup formatting
int block_size, | ||
int max_context_len, | ||
const c10::optional<torch::Tensor>& alibi_slopes); | ||
void paged_attention_v1(torch::Tensor &out, torch::Tensor &query, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should probably avoid reformatting this file, it'll cause headaches later on when syncing with main vLLM repo
@@ -1,29 +1,35 @@ | |||
import torch | |||
|
|||
from magic_wand import SparseTensor, SparseBitmaskStorageFormat | |||
from typing import Type |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this file changing? this seems unrelated to Marlin
#endif | ||
ops.def("gptq_gemm", &gptq_gemm, "Quantized GEMM for GPTQ"); | ||
ops.def("gptq_shuffle", &gptq_shuffle, "Post processing for GPTQ"); | ||
ops.def("squeezellm_gemm", &squeezellm_gemm, "Quantized GEMM for SqueezeLLM"); | ||
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: remove unnecessary format change
@@ -148,9 +148,9 @@ def _verify_tokenizer_mode(self) -> None: | |||
self.tokenizer_mode = tokenizer_mode | |||
|
|||
def _verify_sparsity(self) -> None: | |||
supported_sparsity = ["sparse_w16a16"] | |||
supported_sparsity = ["sparse_w16a16", "semi_structured_sparse_w16a16"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please rebase/merge with our main properly, it seems like you've picked up some recent changes into this diff
Closing in favor of #26 |
No description provided.