[LLM] Clarity on prefix caching #300

attafosu · 2024-10-22T15:50:09Z

In the FAQs on LLMs there's no comment on prefix caching. However, the Llama3-405B intends to use vLLM for reference implementation, which supports the prefix caching feature. Is this optimization going to be allowed? If so, will this conflict with any of the existing rules on caching?

attafosu · 2024-10-22T16:44:58Z

Deferring to the taskforce for resolution

attafosu closed this as completed Oct 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[LLM] Clarity on prefix caching #300

[LLM] Clarity on prefix caching #300

attafosu commented Oct 22, 2024

attafosu commented Oct 22, 2024

[LLM] Clarity on prefix caching #300

[LLM] Clarity on prefix caching #300

Comments

attafosu commented Oct 22, 2024

attafosu commented Oct 22, 2024