Add support for Deepseek #2692

@LaurentMazare the DeepSeek V3/R1 model is very large - do you think it would be more appropriate to implement it similar to the llama multiprocess example?

super-fun-surf · 2025-01-27T20:12:49Z

rad!. the quantized / distilled versions of R1 should work on smaller gpus.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for Deepseek #2692

Add support for Deepseek #2692

franklucky001 commented Dec 31, 2024

bnusunny commented Jan 24, 2025

phdbrianlee commented Jan 24, 2025

super-fun-surf commented Jan 25, 2025

EricLBuehler commented Jan 25, 2025

super-fun-surf commented Jan 25, 2025

franklucky001 commented Jan 26, 2025

EricLBuehler commented Jan 27, 2025 •

edited

Loading

super-fun-surf commented Jan 27, 2025

Add support for Deepseek #2692

Add support for Deepseek #2692

Comments

franklucky001 commented Dec 31, 2024

bnusunny commented Jan 24, 2025

phdbrianlee commented Jan 24, 2025

super-fun-surf commented Jan 25, 2025

EricLBuehler commented Jan 25, 2025

super-fun-surf commented Jan 25, 2025

franklucky001 commented Jan 26, 2025

EricLBuehler commented Jan 27, 2025 • edited Loading

super-fun-surf commented Jan 27, 2025

EricLBuehler commented Jan 27, 2025 •

edited

Loading