You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the FAQs on LLMs there's no comment on prefix caching. However, the Llama3-405B intends to use vLLM for reference implementation, which supports the prefix caching feature. Is this optimization going to be allowed? If so, will this conflict with any of the existing rules on caching?
The text was updated successfully, but these errors were encountered:
In the FAQs on LLMs there's no comment on prefix caching. However, the Llama3-405B intends to use vLLM for reference implementation, which supports the prefix caching feature. Is this optimization going to be allowed? If so, will this conflict with any of the existing rules on caching?
The text was updated successfully, but these errors were encountered: