-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HTML (EEx/HEEx/Surface): Refactor syntaxes #33
Conversation
Your knowledge and help is greatly appreciated. |
Thanks a lot for your time and putting everything in a PR! That's awesome! |
Are we happy to merge this one? |
Certainly, but let me check a bit nevertheless. :) |
Yup. Was both getting your confirmation and making sure @deathaxe has not more to add at this stage. 😄 |
I can change that back, if that's your style. Wasn't aware of it, tbh. Would like to include some changes to Surface as well as
I am currently a bit lost at the question of how to correctly name those embedded stuff in various situations. Main question is ASP/PHP use During recent ST4 dev cycle we added several string interpolation features to improve highlighting and auto-completions behavior (ST disables auto-completion and various other stuff in strings). So we end up in something like Another question arises with a look at HEEx and Surface, which support those Finding a consistent solution for all syntaxes might take another roundtrip, IMHO. |
Is something like |
Should be done, except noted aspects, I'd like to read some input from elixir specialists.
|
I just checked with Surface. It considers all those characters to be part of the tag. So no interpolation. My guess is HEEx does the same. Btw, I tried something like |
Cool! Will review everything. Thanks for making it clear with great commit messages and a clean history! There's also definitely some stuff to think about. I didn't know that auto-completion doesn't work when it's embedded inside a string unless the scope is cleared. I never noticed anyway, because I configured ST to always show me suggestions in any scope... |
Sure. I've mainly gone through formal stuff without deep knowedge about the syntax itself. It may still contain some bugs or I may have broken something. Another major aspect with clearing string scope is correct highlighting of otherwise unmatched tokens in embedded source. They would be highlighted as string without clearing. I just saw some tests like |
Yep, they're "accepted". I think the parser just checks if everything is syntactically correct, not so much if it makes semantic sense. |
Thanks for sorting the tests and removing the tag interpolations! |
Appears I missed to tweak some tests, again. It's very likely ST's sregex engine won't support atomic groups anytime soon. A modern syntax should avoid anything which causes Those advanced features may have been needed in tmLanguage syntaxes and therefore still exist after those have been converted to sublime-syntax format, but with all the context magic at hand in sublime-syntax, patterns are often simple enough to not need all those sophisticated regex features. One reason to rewrite those converted syntaxes from scratch. |
The recent commit adds some changes to illustrate atomic groups can easily be replaced by simpler patterns in various situations. |
Hey, I feel a bit bad about it but I squashed most commits and re-based onto master. I consolidated (almost) all the commit messages. Hope that's alright! I made only superficial content changes:
I pushed to |
You don't need to feel bad at all. It's your project. I am here to offer some proposals, only. The other way round I have mixed feelings as I introduced so many changes to the already very good syntax definitions. 😄 |
There are lots of great changes and we appreciate it! You managed to delete a lot of unneeded lines as well. I always love that about refactoring. 😄 I saw there's a possible optimization in - match: '[[:lower:]_]\w*[?!]?|(?!\.)'
scope: variable.function.surface
pop: 1
- match: \.
scope: punctuation.accessor.dot.surface What do you think? Is it better to leave it be? |
In which way is it an optimization? ST compiles those patterns into one optimized regexp anyway and immediately pops if none of the consuming patterns match. Can you prove this negative lookahead has any positive effect on performance or something like that? I always found negative lookaheads to slow things down, so try to avoid them. That's also what a benchmark against 5k lines of the following snippet reveals.
Your solution is a bit slower. The current solution requires 47ms on my box, while yours takes 50ms. |
Cool, thanks for going to the trouble of benchmarking it! It was an assumption I made that we'd save some CPU cycles if we removed the |
Found something: we should probably change |
## Distinguish embedding and interpolation This commit splits existing contexts into two types: 1. `eex-embedded`, which is used by default (prototype) 2. `eex-interpolations`, which is used to interpolate strings The difference between the two is that (1.) does not clear any scopes, while (2.) clears 1 scope (the string scope). Before this commit all tags used outside of strings cleared the main scope of the syntax `text.html.eex`. Test cases are sorted so those testing later tokens in a line follow those which test earlier tokens. ## Embedded punctuation and meta scopes This commit... 1. proposes to scope the whole `<%` and `%>` token as `punctuation.section.embedded`. These are no real tags but rather special markers to specify start and end of embedded elixir code. It can be compared with PHP, JSP, ASP and various other templating syntaxes, which expand those embedded code blocks while processing the template content. That's why `entity.name.tag` seems not suitable. 2. scopes the embedded code `meta.embedded` because the punctuation already use `punctuation.section.embedded`. 3. scopes embedded code in stings `meta.interpolation meta.embedded`, due to a scope naming guideline suggesting the former scope for embedded code in strings. 3. scopes embedded code `source.elixir.embedded.html` to enable auto-completions. ST requires the `source` scope to do handle content as code rather plain text. 4. removes "atomic groups" as those are not supported by ST's sregex syntax engine (see: `<%(?:%=?|[=/|]?)`) ## Exclude prototypes Avoid including the `prototype` context in embedded code sections as <% ... %> won't be valid inside them self. ## Rethink comments This commit modifies eex comments so they appear as normal interpolation. This change is inspired by ST's JSX/TSX syntax, which supports comments in JSX components by `{/* comment */}`. The braces are scoped as interpolation punctuation, while only the content `/* ... */` uses comment scope. It appears more accurate with regards to how interpolation works in template languages and may wanted to be applied in other syntaxes as well. ## Don't stack meta.interpolation and meta.embedded Probably not useful nor needed to mix those two in case the distinction between embedding and interpolation is to be maintained. ## Renamed some context names and appended "-pop" (Aziz)
## Re-use contexts from EEx This commit... 1. re-uses `eex-embedded` context in HTML (HEEx).sublime-syntax 2. copies over required test cases 3. sorts test cases the same way as in previous commits. ## Distinguish elixir interpolations This commit separates normal interpolation from string interpolation to correctly clear string scopes but don't do so for unwanted maybe top-level scopes. ## Add missing meta.tag scope Some color schemes rely on `meta.tag` to scope attribute values correctly. ## Re-use some existing HTML contexts This commit... 1. replaces some explicit patterns by already existing contexts, derived from inherited HTML.sublime-syntax or HTML (Plain).sublime-syntax 2. Removes unnecessary `apply_prototype` as it hasn't any effect when including contexts from the same (this) syntax definition. ## Rename some contexts Many contexts pop, so naming all of them `-pop` is probably somewhat overwhelming. Here's a suggestion which proved quite useful to express popping contexts while keeping names clean. 1. Non-popping contexts use plural names (e.g. "tags") 2. Popping contexts use singular name (e.g. "tag-content") Note: HTML.sublime-syntax unfortunately doesn't follow this naming scheme consequently for historical reasons and won't do so anytime soon in order to not break existing 3rd party packages. Any new syntax should probably do so though. ## Remove unsupported regexp patterns Atomic groups and possessive quantifiers are unsupported by sregex engine and may either have no effect or trigger slower Oniguruma. Please: check if this has negative effects. ## Simplify tag-name contexts Context switches are not needed and some patterns can be shared. ## Avoid duplicate tag name assignments Avoiding capture groups can improve parsing performance. ST is quite good with regards to re-using lookahead results, so they the overall result is at least equal or even slightly faster with regards to performance, while code looks somewhat cleaner. ## Rename variables Some `tag_...` variables already exist in HTML.sublime-syntax and HTML (Plain).sublime-syntax. This commit makes sure to not regress something due to overriding some HTML related variables by accident. Note: All new variables and contexts should be prefixed. ## Prepend heex tags to tag-other Only low priority, but HEEx tags are basically foreign tags with regards to HTML specs. As such those should be placed in `tag-other` context. It ensures any valid html tag is matched correctly under all circumstances. It would even work if foreign tags are lower-case. ## Renamed some context names and appended "-pop" (Aziz)
## Re-use elixir interpolation from HEEx This commit... 1. reduces duplicate context definitions 2. ensures same interpolation/embedded scopes are used across syntaxes ## Prepend surface tags to tag-other Only low priority, but surface tags are basically foreign tags with regards to HTML specs. As such those should be placed in `tag-other` context. It ensures any valid html tag is matched correctly under all circumstances. It would even work if foreign tags are lower-case. ## Embedded surface comments This commit... 1. adds named context for surface comment content as those help debugging syntaxes via ST's scope name popup. 2. removes `clear_scopes: 1` as it removed the main scope. ## Rename variables Some `tag_...` variables already exist in HTML.sublime-syntax and HTML (Plain).sublime-syntax. This commit makes sure to not regress something due to overriding some HTML related variables by accident. Note: All new variables and contexts should be prefixed. ## Add missing meta.tag scopes Some color schemes rely on `meta.tag` to scope attribute values correctly. ## Rename some contexts Many contexts pop, so naming all of them `-pop` is probably somewhat overwhelming. Here's a suggestion which proved quite useful to express popping contexts while keeping names clean. 1. Non-popping contexts use plural names (e.g. "tags") 2. Popping contexts use singular name (e.g. "tag-content") Goal ist to follow the introduced scheme of HEEx Note: HTML.sublime-syntax unfortunately doesn't follow this naming scheme consequently for historical reasons and won't do so anytime soon in order to not break existing 3rd party packages. Any new syntax should probably do so though. ## Reorganize Markdown/Raw tag contexts This commit just moves markdown and raw contexts to group them with the surface-other contexts. The idea is the following structure: 1. extended HTML contexts 2. surface tag contexts 3. surface blocks and embedding contexts 4. eex interpolation ## Simplify tag-name contexts Context switches are not needed and some patterns can be shared. ## Remove unsupported regexp patterns Atomic groups and possessive quantifiers are unsupported by sregex engine and may either have no effect or trigger slower Oniguruma. Please: check if this has negative effects. ## Add named context for surface-blocks ## Simplify conditional block pattern ## Don't clear comment scope in interpolation Follow HEEx implementation. ## Simplify html-custom-tags This commit... 1. removes the group (?:) from `surface_tag_char` variable as it seems unnecessary to wrap a char-set this way. 2. adds a `html_custom_tag_char` custom variable which is a copy of `surface_tag_char` but excludes the period. This new variable allows to replace the groups from the patterns in the `html-custom-tags` context by a simple char-set pattern. 3. As the pattern in `html-custom-tags` matches every valid tag character, it is not required to push into `surface-begin-tag-name` context as it doesn't have to offer anything more. It just pops immediately. 4. `html-custom-tags` basically overrides `tag-other` from HTML.sublime-syntax in a more general way by not requiring a tag name to start with an ascii char. Thus `meta_prepend: true` can be removed from `tag-other`. The pattern from HTML wouldn't match anyway. ## Renamed some context names and appended "-pop" (Aziz)
Ah, sorry, I didn't mention it. I force pushed to our deathaxe/proposals branch.
Yes. So I managed to fix the Another issue I encountered is that attributes like |
FYI it's best not to force push a branch that multiple people are working on. Also prefer If the purpose was to organize the commits, it's best done at the very end when you want to merge it in. And it's safer to create a new branch on top of it before you squash. |
I've always been using Yeah, the way I handled this here wasn't optimal. I should have pushed onto deathaxe's branch to keep the changes visible for some time. |
But you make it impossible to track changes by this kind of abusing git. That's not the purpose on version control! Please don't force push to my branch making it look broken by me! |
Why bothering if embedding is not supported in attribute values? |
To be sure to understand correctly I just checked https://surface-ui.org/template_syntax. The very first example shows embedded elixir code in html tag attributes. So why do you think it is not supported? The following is status quo of this branch: Surface:
HEEx:
EEx:
|
Expressions are borrowed from HEEx syntax tests to illustrate differences.
... working as expected with current branch.
Follow the modifications done to the HTML (...) syntax files.
I know, I know. It wasn't the best thing to do, but I don't think I abused it. I only ever worked on my branch. You can still compare changes by diffing your local branch with my remote branch, or by selecting the top commits in Sublime Merge.
Never pushed let alone force pushed anything to your branch. I only did it on my branch because I thought we'd be finished soon and only a ff-merge would remain to be done. Next time I'll be more careful, create normal commits and discuss squashing beforehand when necessary. :)
Embedding is supported as values but not inside the strings. In my example above, the embedding is inside the string. That wasn't supported or isn't anymore. <!-- Invalid: -->
<div {@some_attr}="value" style="{@value}" />
<!-- Valid: -->
<div {[class: "x"]} style={@style} /> |
@azizk I think you pulled the branch from this PR and force pushed after you changed it. This branch is @deathaxe 's.
You both are pointing to the same remote branch that @deathaxe created. You probably forgot about it since this PR has been running for a while now. You are able to do that because you are a maintainer of the project. |
Of course, but it gets more and more complicated the more changes have been made. It gets even more difficult if the foreign branch was rebased on a nother branch which introduces many changes in the meanwhile. It's better to just merge the master into the feature branch if required. If the intermediate changes are of no longer use they can be squashed to one commit, when merging the PR to master at the end.
Ahh ok, thanks |
Fixes #35 This commit adds support for `<:slot>` tags following the scheme of HTML (Surface).sublime-syntax.
Thanks for making all the changes and adding support for HEEx slots! Do you think it would make sense to keep the meta scopes separate, e.g. having |
Not sure how important such a distinction really is. I just use I gave arguments for I followed the first approach with Ruby in Rails or Java Server Pages because it enabled me to re-use those contexts in embedded CSS and JS syntaxes. If we use |
Just for the record: I am done with this PR and won't add any further commits. |
Thanks for all the effort, @deathaxe! I want to finish this, but I couldn't pay full attention to it due to other things. Hope to close this soon. Next release has been long overdue. :) |
@deathaxe noted, thanks a lot! |
So I finally got around to working on this again and I finished it up. I think I integrated almost all of your suggestions, @deathaxe, except for these:
Thanks again a lot for your extensive help and your active contributions! New package release is coming. 🥳 |
The recent syntax changes look very good and have a nice and clean structure.
Here are some ideas and proposals for HTML (EEx).
What needs consideration is embedding code in strings/attribute values needs slightly different handling than embedded code in other locations of HTML. The reason is the string scope which needs to be cleared so embedded code gets correctly highlighted and auto completion is enabled. On the other hand we don't want to clear any scopes if code is embedded outside of strings as this would very likely clear the main scope of the syntax. That's not desirable.
To achieve that two types of "embedded" contexts are required. There's no real guideline at the moment, but we may probably call normally embedded contexts "embedded" and those in strings "interpolation". We might also use interpolation vs. string-interpolation or something along those lines.
I haven't had a look at the other syntaxes of this packages, but I personally find it a bit strange to call
<%
a tag. With regards to ASP/JSP/PHP I'd suggest to scope all of<%
and%>
punctuation.section.embedded
.Some more comments about this can be found in commit messages.
Disclaimer: You asked for comments ;-)