-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handling serialisation overflow #6
Comments
I guess there's two main parts to this question; how to handle arbitrary depth / size, and how to make it still be fast. Would it be possible (or a good idea) to allocate memory for each custom tag, and make the serialiased buffer point to this memory? That way, the size wouldn't matter and it could consistently handle a certain depth. |
We could do that, but the question is when to clean up those separately-allocated custom tag names. In the process of editing, there would be a lot of intermediate states where you had a different custom tag name, like:
Maybe we could only allocate them separately once you had a matching close tag? It's definitely possible but I'm just not sure it's worth the complexity. Do people ever have very very long tag names? I think they're always going to be around 20 characters or less in practice.
Yeah, that's true. Maybe we should store a "discarded tag count" so that we could always do this? |
I don't even use HTML (currently). I'm just trying to make the parser as general as possible. I agreee that long names are very unlikely to be used, but I also think this parser should have the capacity to support arbitrarily nested and tagged HTML, without a hard limit. If memory allocation is allowed, then another idea is to treat the buffer as a linked list; mark at the end whether it stops there, or continues at another location (potentially repeated, though a limit could be enforced). My concern there was if a serialised buffer is always deserialised, as it could cause a memory leak if not. On the Tree-sitter side, maybe it could look for a method to request a larger buffer size, up to some limit? This way, sane HTML works as it does currently, but it would be able to accommodate deeper nesting on demand. |
I've been looking through external scanners, and was trying to work out how the serialisation worked in this package. Originally, I thought it would try it's best and discard deeper scopes than it could hold, and that seems to be the case.
However, it does not store whether it discarded tags or not. This leads to closing tags being marked as erroneous, even when they match. E.g., the following has a middle tag that's too long to be stored properly. The current behaviour is to mark it's closing tag and
</foo>
as invalid.This seems worse than marking invalid endings as valid. Could it use a bit to indicate if all the tags were saved, or if some were discarded? Could this be used to improve the behaviour?
It gets worse when same tags are nested:
Because
bar
was the only name that made it onto the buffer, it sees the inner closingbar
and thinks that closes everything, so marks the rest as invalid. I don't even know if it's possible to account for this one, considering some tags can be self closing (so we can't just count tag ends for the number we discarded).What about marking the "serialisation" as invalid or something, and requesting a full reparse from some point? (I was going to reference the TextMate parser here, but then saw it doesn't even try to match anything. So no more suggestions).
The text was updated successfully, but these errors were encountered: