Why the linear input for Layer.pack "must be of type `torch.half`"? #28

Azure-Tang · 2024-06-26T07:57:19Z

It seems that if we remove assert in the Layer.pack, then we can pack an bf16 linear?

By the way, will marlin support "int4 \times bf16" as input?

The text was updated successfully, but these errors were encountered:

efrantar · 2024-06-26T20:32:40Z

Hi, Marlin currently does not support BF16 inputs (though in many cases you can just convert your BF16 model to FP16). These require slightly different GPU instructions as well as a slightly different dequantization process (since there are only 5 mantissa bits). This is also why Layer.pack has a corresponding assert.

Azure-Tang · 2024-06-27T07:01:42Z

hi, I notice that the core mma function for bf16 is supported by vllm's gptq marlin. It seems that a few changes can do this feature. https://github.com/vllm-project/vllm/blob/main/csrc/quantization/gptq_marlin/gptq_marlin.cu#L89

I really need this bf16 input (actually most model is by bf16 now), if it doesn't take much time can you merge that feature in the marlin repo? If not, I can do this later.

brisker · 2024-07-26T03:56:56Z

@Azure-Tang
Is bf16 support done? Have you made a PR elsewhere?

Azure-Tang · 2024-07-26T16:26:46Z

@Azure-Tang Is bf16 support done? Have you made a PR elsewhere?

Didnt donw yet, maybe next week?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why the linear input for Layer.pack "must be of type `torch.half`"? #28

Why the linear input for Layer.pack "must be of type `torch.half`"? #28

Azure-Tang commented Jun 26, 2024

efrantar commented Jun 26, 2024

Azure-Tang commented Jun 27, 2024 •

edited

Loading

brisker commented Jul 26, 2024

Azure-Tang commented Jul 26, 2024

Why the linear input for Layer.pack "must be of type torch.half"? #28

Why the linear input for Layer.pack "must be of type torch.half"? #28

Comments

Azure-Tang commented Jun 26, 2024

efrantar commented Jun 26, 2024

Azure-Tang commented Jun 27, 2024 • edited Loading

brisker commented Jul 26, 2024

Azure-Tang commented Jul 26, 2024

Why the linear input for Layer.pack "must be of type `torch.half`"? #28

Why the linear input for Layer.pack "must be of type `torch.half`"? #28

Azure-Tang commented Jun 27, 2024 •

edited

Loading