Replies: 1 comment
-
This is related to (but distinct) from a more common challenge of parsing HTML or XML with arbitrary tags. These are valid:
But these are not:
|
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
What is the correct way to ensure that a terminal is remembered and then matched later? The reason why is because I am trying to restrict LLM output to a SMILES string.
See here for a description: https://en.wikipedia.org/wiki/Simplified_Molecular_Input_Line_Entry_System
And here for a grammar that almost does it: https://depth-first.com/articles/2020/04/20/smiles-formal-grammar/
Here are some examples of valid strings. Note that the digit functions to close loops in a chemical structure. Whenever a specific digit appears, it must appear at least one more time. So these are valid:
C1CCC1
C1CC2CCC1CCC2
C2CCCC2
C3CCC3CCC2CCCC2
C1CCC1CCC1CCCC1
However, these are invalid:
C1CCC
C1CCCCC1CCC2
C2CCCC3
C3CCCCCC2CCCC
C1CCC1CCC1CCCC
Is there a way to do this with an EBNF grammar? If not, is there a way to use outlines to do this some other way? If it requires new features, what is the right way to explain the feature request?
Beta Was this translation helpful? Give feedback.
All reactions