You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
in re "When generating OpenChatML, the same structure and rules should be followed to ensure compatibility and consistency."
Is this to be different from Postel's law, which would encourage parsers to be more forgiving?
If strictness is the requirement, there are some areas that may need clarification including escaping, nested tags, character encoding, and a machine verifiable definition of some sort.
The text was updated successfully, but these errors were encountered:
I think the spec should be strict, i. e. clear and unambiguous. Implementation, i. e. when finetuning the model based on data that's formatted accordingly, it would probably make sense to add a percentage of differently-formatted examples to get the model to be able to deal with that, too.
I'm convinced that LLMs understand even terribly misspelled and grammatically incorrect input because of them being pretrained on the whole Internet where there's all kinds of weird data, but still the majority is proper spelling and grammar, so the model knows what's right and at the same time understands what you mean. I'd expect the same to hold true regarding the finetuning data.
in re "When generating OpenChatML, the same structure and rules should be followed to ensure compatibility and consistency."
Is this to be different from Postel's law, which would encourage parsers to be more forgiving?
If strictness is the requirement, there are some areas that may need clarification including escaping, nested tags, character encoding, and a machine verifiable definition of some sort.
The text was updated successfully, but these errors were encountered: