-
-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Misc] Validate grammar and fail early #11119
base: main
Are you sure you want to change the base?
Conversation
👋 Hi! Thank you for contributing to the vLLM project. Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can do one of these:
🚀 |
Signed-off-by: Cody Yu <[email protected]>
c5c5aa8
to
4baf2cb
Compare
Hey @comaniac, 1/ Let's add unittests for this. Let's make sure it doesn't diverge from this behavior later on. |
I don't know where to add unit tests for xgrammar backend. It seems not much unit tests have been added for this component. @mgoin do you have any pointer or should we merge this PR first and make up unit tests later? Per offline discussion with @mgoin, this PR also fixes the Lark-like issue. I also found another issue that crashes the engine with my fix:
The fix in this PR is to make sure |
if tokenizer_hash in TokenizerDataCache._cache: | ||
encoded_vocab = None | ||
stop_token_ids = None | ||
backend_str = None | ||
else: | ||
tokenizer_data = TokenizerDataCache.get_tokenizer_data(tokenizer) | ||
encoded_vocab = tokenizer_data.encoded_vocab | ||
stop_token_ids = tokenizer_data.stop_token_ids | ||
backend_str = tokenizer_data.backend_str |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
encoded_vocab
cannot be None anymore because the compiler may not be initialized even the tokenizer data is cached if the grammar is invalid. This change shouldn't hurt performance because the tokenizer data is cached anyways.
Signed-off-by: Cody Yu <[email protected]>
Signed-off-by: Cody Yu <[email protected]>
83a32cb
to
4f003b3
Compare
Signed-off-by: Cody Yu <[email protected]>
Signed-off-by: Cody Yu <[email protected]>
Signed-off-by: Cody Yu <[email protected]>
Signed-off-by: Cody Yu <[email protected]>
If the grammar is invalid, we now let xGrammar throw RuntimeError when compiling it. However, this happens in logits processor, so the exception is raised from model executor. Since we don't expect model executor to throw any exception now, the exception will crash the engine and kill the worker process.
This PR adds a validation to make sure the grammar is valid when constructing the GrammarConfig to solve this issue.
Note that there is another issue with the xgrammar backend that isn't addressed by this PR #11118cc @mgoin