Skip to content
This repository has been archived by the owner on Oct 11, 2024. It is now read-only.

Marlin downstream PR #13

Closed
wants to merge 42 commits into from
Closed

Marlin downstream PR #13

wants to merge 42 commits into from

Conversation

alexm-neuralmagic
Copy link
Collaborator

No description provided.

afeldman-nm and others added 30 commits February 1, 2024 23:41
…anch safe_expose_semi_structured_sparse_tensor
Semi-structured 2:4 sparsity via SparseSemiStructuredTensor
…size by running multiple parallel problems of size 64. (2) Refactor the workspace to be dynamic per layer
int block_size,
int max_context_len,
const c10::optional<torch::Tensor>& alibi_slopes);
void paged_attention_v1(torch::Tensor &out, torch::Tensor &query,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should probably avoid reformatting this file, it'll cause headaches later on when syncing with main vLLM repo

@@ -1,29 +1,35 @@
import torch

from magic_wand import SparseTensor, SparseBitmaskStorageFormat
from typing import Type
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this file changing? this seems unrelated to Marlin

#endif
ops.def("gptq_gemm", &gptq_gemm, "Quantized GEMM for GPTQ");
ops.def("gptq_shuffle", &gptq_shuffle, "Post processing for GPTQ");
ops.def("squeezellm_gemm", &squeezellm_gemm, "Quantized GEMM for SqueezeLLM");

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: remove unnecessary format change

@@ -148,9 +148,9 @@ def _verify_tokenizer_mode(self) -> None:
self.tokenizer_mode = tokenizer_mode

def _verify_sparsity(self) -> None:
supported_sparsity = ["sparse_w16a16"]
supported_sparsity = ["sparse_w16a16", "semi_structured_sparse_w16a16"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please rebase/merge with our main properly, it seems like you've picked up some recent changes into this diff

@robertgshaw2-neuralmagic
Copy link
Collaborator

Closing in favor of #26

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants