-
Notifications
You must be signed in to change notification settings - Fork 516
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix incompleteness of regex and cfg guided generation #544
Conversation
Nice! One thing to be mindful of as you develop this PR, make sure the eos_token_id is in
e.g. on main this prints
|
Thanks @lapp0 ! Confirmed
|
Okay, here's the second enhancement. I had raised in #391 that the current CFG implementation was also incomplete because, in a setting where it could either extend the current terminal or start the next terminal, it would always start the next terminal. This would preclude important strings, e.g., multi-token variable names, etc. This has now been fixed by proposing both tokens that would extend the current terminal as well as tokens that would terminate the current terminal and start the next terminal. The LM can now sample to select which path to take. Both FSMs (tracking progress separately for each terminal) are stored and can then only the correct one is transitioned (and any stale ones removed) once we see what token the LM generates. For example, for the following grammar:
previously, only the following strings were possible:
It would never be possible for the innermost string to extend as it would always be matched after a single character and only next terminals allowed. Now, strings such as those in this next example set are supported:
I've updated documentation and added a new test case, now covering the example that used to |
Okay, this is it for now; ready for review. I was also thinking of memoizing the Another optimization that should definitely happen eventually (and will have a greater performance impact) is doing some smarter state tracking / caching for the incremental parsing. Currently (https://github.com/benlipkin/outlines/blob/fsm-enhancements/outlines/fsm/fsm.py#L270-L272) we just feed the whole prefix again each time we go back to the parser to propose a regex to enter a new accepting state. This duplicates a bunch of computation for both the lexer and the parser that repeats for the entirety of the prefix that has already been committed (up through the last closed token). This can be ameliorated with some clever caching. I know there's some other performance improvements also baked into the Let me know any comments on the other two enhancements here. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work, LGTM.
IMHO, the other enhancements should be a separate PR. I did some work on parser caching in https://github.com/vllm-project/vllm/pull/2105/files#diff-2d38048e543f22b75ca22a9c5f45292a7f56bb4ad1a3e6c5f606a1a583b64707R270 Effectively, it caches the push-down automata based on the parsers stack. Happy to collaborate to introduce parser performance improvements if you're interested. I'm working on some CFG benchmarks right now which should help #549 |
Thanks @lapp0 ! Sounds good re keeping this PR to completeness-focused enhancements and tackling performance-focused optimizations in a different PR. Yes, I would be interested in collaborating on this (thanks for sharing your vllm commit). Awesome to see some more benchmarking/profiling happening in #549 |
4127e60
to
f1109d5
Compare
I'm writing a few @rlouf could you review when you get a chance? |
…o extend current terminal vs transition to next
f1109d5
to
a2a8f7f
Compare
Sorry for the time it took me to review the PR, I needed to make sure I understood the changes. This looks good to me, thank you for opening a PR! |
I'm opening a PR to include a few small misc FSM enhancements. These should each be minor, but all focus on improving the completeness and/or performance of the FSM classes.
In this first commit, I modify how
final_state
is determined. Previously, forRegexFSM
, the implementation ofis_final_state
wasreturn state in self.final_states
. This logic is incorrect, since generation should not terminate on any accepting state; it should only terminate when an EOS has been generated (else*
is reduced to?
). Here are the counts of unique outputs on an example generation of 1000 samples from the regexa*
Instead, now for the FSM interface, we set
final_state = FSMState(-1)
, and innext_state
,if token_id == self.eos_token_id: return self.final_state
andis_final_state
can now be implemented asreturn state == self.final_state
. Here are 1000 new samples, which are now correctly geometrically distributed.While making this modification, I also cleaned up the code in a few more places.
Now that
stop_at
is handled bySequenceGenerator
(thanks to #451),StopAtTokenFSM
is only ever used to stop atEOS
. I've made this explicit. Next, with both of the other FSMs using this new implementation, I broughtCFGFSM
on board as well (which also reduces some messiness and brings it closer to the rest of the interface) and moved this implementation to the abstract protocol level. This also allowed us to get rid of some hard-coded-1
states floating around in the code by just defining it once at the interface.