How to support re-compute kv-cache after certain decoded token #6886
jiazhan-msft
started this conversation in
Ideas
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I have a feature in my model which switches model setup after certain decoded token, e.g., when decoded to the n-th token, the model requires re-compute previous kv-cache, what's the possible path to enable this support? Thanks!
Beta Was this translation helpful? Give feedback.
All reactions