Quantization #6

jeaneigsi · 2024-11-16T14:54:05Z

is it possible to load model on quantize version with bitsandbytes , i mean int4, int8 other other and how try 100 000 tokens lenght, i get message : Token indices sequence length is longer than the specified maximum sequence length for this model (798 > 512). Running this sequence through the model will result in indexing errors

and pretty good work, i share the same philosophie particularly i think t5 architecture is better for seq to seq task than decoder only. llm overgenerates and loss curves struggle to converge

Ingvarstep · 2024-12-11T14:19:58Z

@jeaneigsi , thank you, I am also a big believer in encoder-decoder architectures, in one of my projects - a translator from different formats of chemicals it made a lot of sense.

Regarding your questions, unfortunately, our flash-attention realisation doesn't support int4, int8.

jeaneigsi · 2024-12-16T21:49:54Z

@jeaneigsi , thank you, I am also a big believer in encoder-decoder architectures, in one of my projects - a translator from different formats of chemicals it made a lot of sense.

Regarding your questions, unfortunately, our flash-attention realisation doesn't support int4, int8.

Ok thanks a lot , i am stay tune for incoming updates, keep build greats thing

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Quantization #6

Quantization #6

jeaneigsi commented Nov 16, 2024 •

edited

Loading

Ingvarstep commented Dec 11, 2024

jeaneigsi commented Dec 16, 2024

Quantization #6

Quantization #6

Comments

jeaneigsi commented Nov 16, 2024 • edited Loading

Ingvarstep commented Dec 11, 2024

jeaneigsi commented Dec 16, 2024

jeaneigsi commented Nov 16, 2024 •

edited

Loading