-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mismatch between WASM and native builds #82
Comments
I think it may be because the wasm binding uses the UTF16 encoding, due to javascript’s string semantics. Do you still see a mismatch if you transcode to UTF16 in your rust test? The reason that it matters is that certain “error costs” are calculated using nodes’ byte length. This is something I’ve been a bit unsatisfied with for a while, but I still don’t think it’s worth the memory cost to store each node’s Unicode character count. We could make them behave more similarly by dividing the byte count by 2 when using UTF16. 😸I’d be curious if you have any suggestions. |
It seems so! There's no mismatch if I change // let tree = parser.parse(&input, None).unwrap();
let utf16: Vec<u16> = str::from_utf8(&input).unwrap()
.encode_utf16().into_iter().collect();
let tree = parser.parse_utf16(&utf16, None).unwrap();
Yeah, I think it's not worth it, at least for programming languages. Non-ascii characters are rare, and would mostly be in comments/strings. I'm not sure about markup languages though. Maybe we could let grammars override the error costs in specific places?
I think making them more similar would be good, but I'm not sure about dividing by 2 when it's UTF16. 😅 In this case specifically, the syntax tree for UTF16 is more desirable: |
There seems to be a mismatch between WASM and native builds (on macOS).
I built the CLI from latest tree-sitter's master (
4c0fa29
) and tried this code:This is the syntax tree reported by the native binding (
tree-sitter test
passes for this commit):This is the syntax tree reported by WASM (through
tree-sitter web-ui
):(source_file (macro_definition name: (identifier)) (line_comment) (identifier) (MISSING ";") (macro_invocation macro: (identifier) (token_tree (identifier) (identifier))))
The text was updated successfully, but these errors were encountered: