-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Syntax Highlighting #16
Conversation
tree-sitter-highlight.el
Outdated
(defun tree-sitter-highlight--after-change (old-tree) | ||
(tree-sitter-highlight--highlight old-tree (window-start) (window-end))) | ||
(tree-sitter-highlight--highlight old-tree (window-start) (window-end)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious, have you considered providing tree-sitter implementations for font-lock-fontify-region-function(and related) ? Thus you will be able to share the logic related to calculating what has to be highlighted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think if we would want to do that we would want to choose jit-lock
instead, font-lock
relies on that (afaik) for knowing when to fontify what.
I've considered it, but didn't implement it yet, since this way is easier. I can add it as a toggle maybe, to see if it would improve things (mainly that currently I can either delay highlighting in case of errors, or eagerly highlight but have to clear (remove all faces from) the whole window first, which causes the part after the error to not be highlighted (which you technically can't do correctly because there's an error after all) and which also causes "flickering" if you type fast enough when you rapidly produce/remove errors in the syntax tree). However I think it's unlikely that jit-lock
will help with that specific problem, because we already know what changed, and if i only clear/highlight what changed, then sometimes things are not highlighted at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update: I've tried both font-lock
and jit-lock
. font-lock
seems to work better, so I've only kept the code for that. I was also wrong about it being easier to just implement myself, so thanks for the suggestion.
This PR looks great! I've been playing around with it and it works quite nicely. One thing I have noticed is that for some situations multiple faces are being applied to the same region. For example in Python: p = Package(...) The Edit: it seems that this issue is likely due to overlapping queries. This may have to be fixed at the |
Thanks, this is great progress! I changed the parameter order of I also pushed the fixes for
Ideally this file should be distributed together with the compiled language description (e.g. |
tree-sitter-highlight.el
Outdated
"Syntax highlighting with tree sitter." | ||
:init-value nil | ||
:lighter "tree-sitter-highlight" | ||
;; TODO: idk if this file should do this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dependencies between minor modes are tricky: https://emacs.stackexchange.com/questions/36225/global-minor-mode-that-requires-another-during-its-lifetime.
We can either use a single minor mode, or make tree-sitter-mode
aware of its possible dependent tree-sitter-
minor modes, so that turning them on/off work properly.
tree-sitter-highlight.el
Outdated
|
||
(require 'tree-sitter) | ||
|
||
(require 'f) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This dependency should be declared in Cask
.
However, I think tree-sitter
is fundamental/low-level enough that fewer external dependencies are desirable. If we just want 1-2 functions from f
, we can reimplement them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added it to Cask
for now, I'm just using f-join
currently, we can re-implement that before we merge this. It's probably not needed if we bundle all the highlights.scm
somewhere.
I know that multiple faces are added, but that's actually intended. For me when I try doing #[derive(Debug)] the At least for the rust implementation of querying the regions to highlight the sort is stable, i.e. if two things apply for the exact same range, the order is kept as given to use by the tree sitter api, which should (I think) reflect the order they're declared in a If you could paste more of the python code for which the faces are being applied in the wrong order (because it doesn't happen in my python files/your code that i just pasted) that would be great. Maybe it's also caused by some incremental highlighting thing, see what happens after you've just openend that buffer and it was highlighted completely, whether the order is the same etc. Edit:
|
I was obtaining a different result as I was using the elisp implementation. Using the Rust implementation, I obtain the expected results. I believe the difference in behaviour is due to this portion in for (i, (node, capture_index)) in vec.into_iter().rev().enumerate() {
v.set(i, env.vector((capture_index, root_node.map(|_| node)))?)?;
} The captures are reversed before they are passed to I'm quite new to all of this so please let me know if I have misunderstood the code somewhere. |
Oh yeah seems like I've just forgotten that in the elisp version then, I only use that to play around with changes anyway because I don't need to restart emacs to reload the rust code. |
Just wondering, what is the state of the branch atm? I tried to load it in this morning, but I'm unable to get updated code to appear formatted by tree-sitter. For example, activating
And then anytime code is updated or even navigated, the |
The code works decently well for me, I've been using it for the past two weeks in Rust and C without any huge problems. I've never seen that error, it could be my fault, I don't really know how faces work in emacs, but settings the face to |
Using Emacs 26.3, tried Python highlighting and ran into this error. I started with my existing setup, downloaded your fork, ran When I opened up a Python file however, it would not automatically turn on I'm not really sure how to debug this. I'll try testing this on a clean Emacs setup and see if it is Helm related or something else on my end. |
@th0rex I turned on debugging, and this is the trace I get: Debugger entered--Lisp error: (wrong-number-of-arguments (2 . 2) 4)
#f(compiled-function (var val) "Set variable VAR to value VAL in current buffer." #<bytecode 0x2238cf>)(font-lock-fontify-buffer-function (lambda nil (tree-sitter-highlight--jit (point-min) (point-max))) font-lock-fontify-region-function (lambda (start end _verbose) (tree-sitter-highlight--jit start end)))
(setq-local font-lock-fontify-buffer-function (lambda nil (tree-sitter-highlight--jit (point-min) (point-max))) font-lock-fontify-region-function (lambda (start end _verbose) (tree-sitter-highlight--jit start end)))
tree-sitter-highlight--enable()
(prog1 (tree-sitter-highlight--enable) (setq err nil))
(unwind-protect (prog1 (tree-sitter-highlight--enable) (setq err nil)) (if err (progn (setq tree-sitter-highlight-mode nil))))
(let ((err t)) (unwind-protect (prog1 (tree-sitter-highlight--enable) (setq err nil)) (if err (progn (setq tree-sitter-highlight-mode nil)))))
(if tree-sitter-highlight-mode (let ((err t)) (unwind-protect (prog1 (tree-sitter-highlight--enable) (setq err nil)) (if err (progn (setq tree-sitter-highlight-mode nil))))) (tree-sitter-highlight--disable))
(let ((last-message (current-message))) (setq tree-sitter-highlight-mode (if (eq arg (quote toggle)) (not tree-sitter-highlight-mode) (> (prefix-numeric-value arg) 0))) (tree-sitter-mode) (if tree-sitter-highlight-mode (let ((err t)) (unwind-protect (prog1 (tree-sitter-highlight--enable) (setq err nil)) (if err (progn (setq tree-sitter-highlight-mode nil))))) (tree-sitter-highlight--disable)) (run-hooks (quote tree-sitter-highlight-mode-hook) (if tree-sitter-highlight-mode (quote tree-sitter-highlight-mode-on-hook) (quote tree-sitter-highlight-mode-off-hook))) (if (called-interactively-p (quote any)) (progn nil (if (and (current-message) (not (equal last-message (current-message)))) nil (let ((local " in current buffer")) (message "Tree-Sitter-Highlight mode %sabled%s" (if tree-sitter-highlight-mode "en" "dis") local))))))
tree-sitter-highlight-mode()
enable-ts-hl()
run-hooks(change-major-mode-after-body-hook prog-mode-hook python-mode-hook)
apply(run-hooks (change-major-mode-after-body-hook prog-mode-hook python-mode-hook))
run-mode-hooks(python-mode-hook)
python-mode()
set-auto-mode-0(python-mode nil)
set-auto-mode()
normal-mode(t)
after-find-file(nil t)
find-file-noselect-1(#<buffer min_spaces_composed_str.py> "~/cart/codereview/min_spaces_composed_str.py" nil nil "~/cart/codereview/min_spaces_composed_str.py" (3937650 66306))
find-file-noselect("~/cart/codereview/min_spaces_composed_str.py" nil nil t)
find-file("~/cart/codereview/min_spaces_composed_str.py" t)
funcall-interactively(find-file "~/cart/codereview/min_spaces_composed_str.py" t)
call-interactively(find-file nil nil)
command-execute(find-file) I'm wondering if I missed some installation step with regards to the rust/elisp code. This is my setup in a clean Emacs launch: (require 'f)
(add-to-list 'load-path "/somefolder/repos/emacs-tree-sitter")
(require 'tree-sitter)
(ts-require-language 'python)
(setq tree-sitter-highlight-query-dir "/somefolder/repos/emacs-tree-sitter/grammars")
(require 'tree-sitter-highlight)
(defun enable-ts-hl ()
(tree-sitter-highlight-mode))
(add-hook 'python-mode-hook 'enable-ts-hl) Before loading this, I do the usual Edit: I think I may be using the elisp code rather than the rust code-- how do I enable this switch? |
Seems like The |
@th0rex Thanks for figuring that out! I guess I should eventually upgrade to 27, but that's good to know. I now have the highlighting working, and am really enjoying it. I'll let you know if I find anything else. |
The specific functions are Since the tree-sitter-based highlighting function is stateless (which is better), I think we should replace these two functions with one that uses |
A suggestion for default behavior-- I prefer operators to be highlighted the same as keywords. Correct me if I'm wrong, but I believe this is how highlighters other this PR would treat them as well. I've made the following change to get the behavior I like: (defface tree-sitter-operator-face '((default :inherit font-lock-keyword-face))
"Face used for operator"
:group 'tree-sitter-highlight-faces) |
Just a heads-up, if you are still working on this: I reorganized the directory structure a bit, moving all Lisp code to I also reworked the types in #26. |
Currently I'm not working on this anymore, but I will again in the future (at most ~6 months from now) and I hope to get a better version done then. You were correct regarding the comment about font-lock, in fact sometimes highlighting got out of sync and some regions were left unfontified during the time I was using this but font-lock thought they were fontified, which is likely a result of font-lock and tree-sitter disagreeing on what region needs to be updated. In general I'm still not quite sure how to clear highlights correctly after a change without causing "flickering", which was the reason for trying to use font/jit-lock. Using |
Yes, this sounds like the right approach to me; you should always be able to highlight based on the current syntax tree, regardless of whether there's a syntax error or what. That's what Atom does. There should be no need for any state besides the tree itself.
I'm curious about this. Syntax errors shouldn't cause noticeable problems for syntax highlighting. If they are, you might be hitting a bug in the query system, or a parsing bug. What language are you using? What highlights are you seeing? If you get a chance, could you try to reproduce the problem with the |
I'm fairly certain this isn't a bug but rather my fault, however I wasn't (and still am not) sure how to do this correctly. This was basically the code in question: (defun tree-sitter-highlight--handle-change (change)
"Called for each CHANGE after an edit."
(let* ((start-byte (aref change 0))
(end-byte (aref change 1))
(node (ts-get-descendant-for-byte-range (ts-root-node tree-sitter-tree) start-byte end-byte))
(matches (ts-query-captures tree-sitter-highlight--queries node tree-sitter-highlight--query-cursor nil #'ts-node-text)))
(set-text-properties (ts-node-start-position node) (ts-node-end-position node) nil)
(seq-do #'tree-sitter-highlight--apply-match matches)))
(defun tree-sitter-highlight--after-change (old-tree)
"Called with the OLD-TREE after the buffer has changed."
(let* ((changes (ts-changed-ranges tree-sitter-tree old-tree)))
(seq-do #'tree-sitter-highlight--handle-change changes))) As far as I can remember, this worked well (it was always correct), however was rather slow since on (some) syntax errors in rust (which I tested this most on since I was mostly writing rust) the That is the whole reason why I even tried integrating with jit-lock/font-lock, I hoped they would somehow help improve this, but they also had their own problems. The syntax errors do not (as far as I noticed) cause any errors in the tree that propagate too far or result in broken queries etc, but they caused me (since I don't know of a better way) to re-highlight most of the buffer most of the time, which was slow. I do not know how to correctly determine which part I should clear and run queries on after changes, that do not result in the whole buffer being rehighlighted on some syntax errors. I've also tried looking at how atom does this but couldn't really find anything. |
@th0rex Thanks for the reply. I haven't checked how this is exposed to elisp, but I would think that you could use one of these APIs to confine your query to the visible screen range, so that they query would not take too long. And then, on top of that, I think you could optionally cache the result of highlighting individual lines, and invalidate/splice that cache based on the
I'm not familiar with the emacs APIs, so there might be some details that make this difficult that I'm not taking into account. |
Ok so I've just retried this, now using the "proper" (I think) way to determine what's visible after scrolling took place. Even without caching the performance (in pure lisp now) seems to be a lot better, and it also seems to always correctly highlight now. I don't really remember why I didn't do this the first time but seems like I should have. Thanks @maxbrunsfeld I'll still eventually implement caching however. I've also "rebased". |
(puthash key value table))) | ||
tree-sitter-highlight-default-faces) | ||
table)) | ||
(let ((x (tree-sitter-highlight--make-queries (alist-get major-mode |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ubolonton not related to this PR but IMO tree-sitter should not be on top of the existing major modes but it should be a replacement for them. So instead of having major-mode -> language mapping IMO it should be "file-regex" -> language.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When there are tree-sitter-based major modes, the mapping mechanism would simply be auto-mode-alist
.
However, that's likely too far away in the future. For adoption, I think it's better to piggyback on major modes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can just use tree-sitter-language
here though, and leave the mapping for tree-sitter-mode
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, maybe we can define a tree-sitter-highlight-queries
var, as the replacement of font-lock-keywords
.
Language-specific minor modes (e.g. tree-sitter-rust
) would then be responsible for loading/populating the queries however they see fit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just want to raise one more point: major-mode -> language mapping is not N -> 1 but it is N -> N. We had that issue in lsp-mode - for example, web-mode could be html, javascript, typescript, and so on.
Also, why do you think that it is too far in the future? I mean except highlighting and eventually indentation what are the other things that will be missing to have tree-sitter centric major modes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just want to raise one more point: major-mode -> language mapping is not N -> 1 but it is N -> N. We had that issue in lsp-mode - for example, web-mode could be html, javascript, typescript, and so on.
Yeah, that's definitely an issue. I haven't thought about multi-language files in-depth.
Also, why do you think that it is too far in the future? I mean except highlighting and eventually indentation what are the other things that will be missing to have tree-sitter centric major modes?
I think the main technical barrier is that many Emacs distributions have dynamic module support turned off. Other than that, I was mainly thinking about adoption/mindshare.
This should fix a kind of rare bug, where still some things would not get highlighted at all. I've also documented the code more and done some general cleanup. Additionally, Stefan Monnier notified me per email that I should be using add-hook and remove-hook for window-scroll-functions.
Just a quick update: I've rebased on the latest master again, cleaned up the code, fixed two other bugs (although this makes scrolling slower now, but its fine if the file is byte compiled) and added changes suggested Stefan Monnier. |
This only applies to scrolling and only in some cases, and highlighting is usually quickly correct again after the syntax error is fixed. Thus it's off by default, since it improves performance a lot this way.
Is it possible to use the compiled language binaries (eg. |
The |
Thanks for clarifying, I figured that was the case. |
I've now switched to using font-lock again (as per the suggestion of Stefan Monnier), however there are still lags for large buffers where If anyone wants to continue this work, they're free to use all my code (if it's useful anyway) in any way they'd like. Like I said, the highlighting with font-lock should at least always be correct now, but may be slow for some files while doing some edits. Maybe this is also a problem with how the grammar is specified and not with tree-sitter itself and it would be working perfectly with other files, or I'm still using tree-sitter wrong, I'm not sure. |
I think the solution would be to only highlight ranges that are both: 1. changed and 2. on screen. |
I've tried that in many different variations, the problems were always with things like r#"
// screen becomes visible here (i.e. this is the first line in the window), because we scrolled down
// we new close the raw string literal:
"#; As far as I can see I need to run the query on the whole buffer (or on whatever Also to be clear: only highlighting the visible lines isn't the problem, actually running the query on the whole buffer is. It is kind of slow, maybe it's also the interop between rust and emacs, I don't know. |
@th0rex Thank you for all your hard work experimenting with this! I'm sorry to bother you with one more request. I understand that you are frustrated and don't want to work on this anymore. I just want to get more info to continue on this. Can you send me these:
I think extending the window range a bit with
This is an issue with Emacs dynamic's modules in general, more so with the Rust binding (because it includes a workaround for a safety bug in Emacs's GC, which will supposedly be fixed in Emacs 27). Calling |
Sadly I can't since it's from a project for work.
https://gist.github.com/th0rex/64a48204f5129c0cd293bd90932bd018 That patch didn't work immediately, but the current state of this PR should be similar enough to it.
https://github.com/th0rex/emacs-tree-sitter/blob/only_hl_visible_query_reported/lisp/tree-sitter-highlight.el
If this would indeed fix it, it would be very nice. Thank you for looking into this as well. |
That's surprising to me. That's not the intended behavior of It sounds like you're hitting a bug in Tree-sitter, or the Tree-sitter rust binding maybe? If anyone is interested and has time, it would be great to get a reproducible bug report for this. |
I'm continuing the work here. I'm using it locally, and haven't noticed any slowdown so far. It feels than Here are the updates to some of the previous issues. The main slowness was due to the highlighting code adding the same face multiple times. As a result, the
I got this with Rust code as well. It seems specific to the Rust grammar. I could never reproduce it in C code. Scoping the queries to the invalidated regions (roughly the visible region) works for me though.
I couldn't reproduce this. I did get large chunks of text not being fontified, but the cause turned out to be the intricate interactions between Emacs's
This is the behavior I've observed so far (no full containment needed). However, I did get some other interesting cases involving |
I've implemented a basic version of syntax highlighting and am looking for feedback (whether the overall approach is good).
Highlighting with tree sitter:
Highlighting with font lock:
Currently I'm not really taking advantage of the more context sensitive tree sitter information, and am just using the default font lock faces.
I'm not sure how to best distribute this
highlights.scm
file, currently users need to specify the path tothis_repo/grammars
, from which thehighlights.scm
file for the given language is then read.The only major bug I've noticed, is that while typing, sometimes code isn't immediatetly highlighted. For example, when I typeManually callingtree-sitter-highlight--highlight
fixes that however, so I'm thinking it might have something to do with the comment on theafter-change
function intree-sitter.el
. Mytree-sitter-highlight--handle-change
isn't being called for unions or lets inside functions, but if I type the same let on the top level it is, I'm not quite sure what's causing that as I'm not familiar with tree sitter.Edit: I've fixed this,
changed_ranges
had a bug. Even if this PR isn't merged, that one commit (6bed1e1) should definetly be merged.My current config for using this looks like this: