Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTML (EEx/HEEx/Surface): Refactor syntaxes #33

Closed
wants to merge 20 commits into from
Closed

HTML (EEx/HEEx/Surface): Refactor syntaxes #33

wants to merge 20 commits into from

Conversation

deathaxe
Copy link
Contributor

@deathaxe deathaxe commented Sep 23, 2021

The recent syntax changes look very good and have a nice and clean structure.

Here are some ideas and proposals for HTML (EEx).

What needs consideration is embedding code in strings/attribute values needs slightly different handling than embedded code in other locations of HTML. The reason is the string scope which needs to be cleared so embedded code gets correctly highlighted and auto completion is enabled. On the other hand we don't want to clear any scopes if code is embedded outside of strings as this would very likely clear the main scope of the syntax. That's not desirable.

To achieve that two types of "embedded" contexts are required. There's no real guideline at the moment, but we may probably call normally embedded contexts "embedded" and those in strings "interpolation". We might also use interpolation vs. string-interpolation or something along those lines.

I haven't had a look at the other syntaxes of this packages, but I personally find it a bit strange to call <% a tag. With regards to ASP/JSP/PHP I'd suggest to scope all of <% and %> punctuation.section.embedded.

Some more comments about this can be found in commit messages.

Disclaimer: You asked for comments ;-)

@princemaple
Copy link
Collaborator

Your knowledge and help is greatly appreciated.

@azizk
Copy link
Collaborator

azizk commented Sep 24, 2021

Thanks a lot for your time and putting everything in a PR! That's awesome!

@princemaple
Copy link
Collaborator

Are we happy to merge this one?

@azizk
Copy link
Collaborator

azizk commented Sep 24, 2021

Certainly, but let me check a bit nevertheless. :)
I saw that the vertical order of the test lines is not consistent with the way I did it in every other test. We should change that to be consistent I think.

@princemaple
Copy link
Collaborator

Yup. Was both getting your confirmation and making sure @deathaxe has not more to add at this stage. 😄

@deathaxe
Copy link
Contributor Author

We should change that to be consistent I think.

I can change that back, if that's your style. Wasn't aware of it, tbh.

Would like to include some changes to Surface as well as

  1. it can re-use some existing contexts from EEx/HEEx
  2. found some scope inconsistencies with regards to embedded Elixir source code, compared to EEx/HEEx

I am currently a bit lost at the question of how to correctly name those embedded stuff in various situations.

Main question is meta.embedded vs. meta.interpolation. This question is not limited to this PR but more general.

ASP/PHP use meta.embedded source.xyz.embedded.html to express ASP/PHP sources within <?...?> or <%...%> sections.

During recent ST4 dev cycle we added several string interpolation features to improve highlighting and auto-completions behavior (ST disables auto-completion and various other stuff in strings). So we end up in something like meta.string meta.interpolation for source code interpolated into strings. I am uncertain whether that's appropriate for those <%...%> sections. Especially as it causes a mix of punctuation scopes.

Another question arises with a look at HEEx and Surface, which support those {...} interpolations. Do they have the same meaning as <%...%> sections but with different syntax, or is distinction needed scope-wise?

Finding a consistent solution for all syntaxes might take another roundtrip, IMHO.

@deathaxe deathaxe marked this pull request as draft September 24, 2021 15:09
@deathaxe
Copy link
Contributor Author

Is something like <ta{@x}g /> valid in HEEx / Surface? I saw, <{@x}> not to be in Suface, so I guess such interpolations must not appear in tag names at all?

@deathaxe deathaxe marked this pull request as ready for review September 24, 2021 17:19
@deathaxe
Copy link
Contributor Author

Should be done, except noted aspects, I'd like to read some input from elixir specialists.

  1. If html tags can't contain elixir interpolation those tag-other-name contexts can be removed from HEEx/Surface
  2. meta.embedded vs. meta.interpolation might be worth some thoughts. Anyway, punctuation should be consistent with choosen meta scope. So punctuation.section.embedded vs. punctuation.section.interpolation.

@azizk
Copy link
Collaborator

azizk commented Sep 24, 2021

Is something like <ta{@x}g /> valid in HEEx / Surface? I saw, <{@x}> not to be in Suface, so I guess such interpolations must not appear in tag names at all?

I just checked with Surface. It considers all those characters to be part of the tag. So no interpolation. My guess is HEEx does the same. Btw, I tried something like ~F"<a {=@x}g />". It parses to ["<a", " g></a>"]. The trailing g becomes an independent attribute.

@azizk
Copy link
Collaborator

azizk commented Sep 24, 2021

Cool! Will review everything. Thanks for making it clear with great commit messages and a clean history! There's also definitely some stuff to think about.

I didn't know that auto-completion doesn't work when it's embedded inside a string unless the scope is cleared. I never noticed anyway, because I configured ST to always show me suggestions in any scope...

@deathaxe
Copy link
Contributor Author

Sure. I've mainly gone through formal stuff without deep knowedge about the syntax itself. It may still contain some bugs or I may have broken something.

Another major aspect with clearing string scope is correct highlighting of otherwise unmatched tokens in embedded source. They would be highlighted as string without clearing.


I just saw some tests like <{}}>{@x}</{}}>. Are they really accepted as tags? According to whatwg a tag name must always start with an ascii letter, so if no interpolation is supported, they would be invalid, IMHO.

@azizk
Copy link
Collaborator

azizk commented Sep 24, 2021

I just saw some tests like <{}}>{@x}</{}}>. Are they really accepted as tags? According to whatwg a tag name must always start with an ascii letter, so if no interpolation is supported, they would be invalid, IMHO.

Yep, they're "accepted". I think the parser just checks if everything is syntactically correct, not so much if it makes semantic sense.

@azizk
Copy link
Collaborator

azizk commented Sep 25, 2021

Thanks for sorting the tests and removing the tag interpolations!
Question regarding atomic groups: should we really remove them now because I think they're pretty useful and maybe ST regexps will support them in the future? I haven't written a regexp engine yet but atomic groups seems like a trivial feature to me. Besides, all the other syntax files use them too and I remember in some cases they're necessary...

@deathaxe
Copy link
Contributor Author

Appears I missed to tweak some tests, again.

It's very likely ST's sregex engine won't support atomic groups anytime soon. A modern syntax should avoid anything which causes Syntax Tests - Regex Compatibility command to generate any warnings to make sure to avoid Oniguruma to be triggered.

Those advanced features may have been needed in tmLanguage syntaxes and therefore still exist after those have been converted to sublime-syntax format, but with all the context magic at hand in sublime-syntax, patterns are often simple enough to not need all those sophisticated regex features. One reason to rewrite those converted syntaxes from scratch.

@deathaxe
Copy link
Contributor Author

The recent commit adds some changes to illustrate atomic groups can easily be replaced by simpler patterns in various situations.

@azizk
Copy link
Collaborator

azizk commented Sep 26, 2021

Hey, I feel a bit bad about it but I squashed most commits and re-based onto master. I consolidated (almost) all the commit messages. Hope that's alright!

I made only superficial content changes:

  1. Added you as an author.
  2. Added the "-pop" suffix to some contexts. I prefer to name them like that to signal that they change the stack.
  3. I changed some plural names to singular where it made more sense to me.

I pushed to deathaxe/proposals. You can push force if you think it's good. 👍

@deathaxe
Copy link
Contributor Author

deathaxe commented Sep 26, 2021

You don't need to feel bad at all. It's your project. I am here to offer some proposals, only.

The other way round I have mixed feelings as I introduced so many changes to the already very good syntax definitions. 😄

@azizk
Copy link
Collaborator

azizk commented Sep 26, 2021

There are lots of great changes and we appreciate it! You managed to delete a lot of unneeded lines as well. I always love that about refactoring. 😄

I saw there's a possible optimization in surface-tag-name-common-pop and heex-tag-name-common-pop:

- match: '[[:lower:]_]\w*[?!]?|(?!\.)'
  scope: variable.function.surface
  pop: 1
- match: \.
  scope: punctuation.accessor.dot.surface

What do you think? Is it better to leave it be?

@deathaxe
Copy link
Contributor Author

deathaxe commented Sep 26, 2021

In which way is it an optimization? ST compiles those patterns into one optimized regexp anyway and immediately pops if none of the consuming patterns match. Can you prove this negative lookahead has any positive effect on performance or something like that? I always found negative lookaheads to slow things down, so try to avoid them. That's also what a benchmark against 5k lines of the following snippet reveals.

<MyApp.User.tag name={@name}>

Your solution is a bit slower.

The current solution requires 47ms on my box, while yours takes 50ms.

@azizk
Copy link
Collaborator

azizk commented Sep 26, 2021

Cool, thanks for going to the trouble of benchmarking it! It was an assumption I made that we'd save some CPU cycles if we removed the immediate-pop rule. I thought ST would have one fewer regexp/rule to check. I stand corrected. 👍

@azizk
Copy link
Collaborator

azizk commented Sep 30, 2021

Found something: we should probably change punctuation.definition.tag.begin.surface to punctuation.section.embedded.begin.surface (and end respectively). Would be consistent with (H)EEx. Right?

deathaxe and others added 6 commits October 3, 2021 00:05
## Distinguish embedding and interpolation

This commit splits existing contexts into two types:

1. `eex-embedded`, which is used by default (prototype)
2. `eex-interpolations`, which is used to interpolate strings

The difference between the two is that (1.) does not clear any scopes,
while (2.) clears 1 scope (the string scope).

Before this commit all tags used outside of strings cleared the main
scope of the syntax `text.html.eex`.

Test cases are sorted so those testing later tokens in a line follow
those which test earlier tokens.

## Embedded punctuation and meta scopes

This commit...

1. proposes to scope the whole `<%` and `%>` token as
   `punctuation.section.embedded`. These are no real tags but rather
   special markers to specify start and end of embedded elixir code.
   It can be compared with PHP, JSP, ASP and various other templating
   syntaxes, which expand those embedded code blocks while processing
   the template content. That's why `entity.name.tag` seems not
   suitable.

2. scopes the embedded code `meta.embedded` because the punctuation
   already use `punctuation.section.embedded`.

3. scopes embedded code in stings `meta.interpolation meta.embedded`,
   due to a scope naming guideline suggesting the former scope for
   embedded code in strings.

3. scopes embedded code `source.elixir.embedded.html` to enable
   auto-completions. ST requires the `source` scope to do handle content
   as code rather plain text.

4. removes "atomic groups" as those are not supported by ST's sregex
   syntax engine (see: `<%(?:%=?|[=/|]?)`)

## Exclude prototypes

Avoid including the `prototype` context in embedded code sections as
<% ... %> won't be valid inside them self.

## Rethink comments

This commit modifies eex comments so they appear as normal interpolation.

This change is inspired by ST's JSX/TSX syntax, which supports comments
in JSX components by `{/* comment */}`. The braces are scoped as
interpolation punctuation, while only the content `/* ... */` uses
comment scope. It appears more accurate with regards to how
interpolation works in template languages and may wanted to be applied
in other syntaxes as well.

## Don't stack meta.interpolation and meta.embedded

Probably not useful nor needed to mix those two in case the distinction
between embedding and interpolation is to be maintained.

## Renamed some context names and appended "-pop" (Aziz)
## Re-use contexts from EEx

This commit...

1. re-uses `eex-embedded` context in HTML (HEEx).sublime-syntax
2. copies over required test cases
3. sorts test cases the same way as in previous commits.

## Distinguish elixir interpolations

This commit separates normal interpolation from string interpolation to
correctly clear string scopes but don't do so for unwanted maybe
top-level scopes.

## Add missing meta.tag scope

Some color schemes rely on `meta.tag` to scope attribute values correctly.

## Re-use some existing HTML contexts

This commit...

1. replaces some explicit patterns by already existing contexts, derived
   from inherited HTML.sublime-syntax or HTML (Plain).sublime-syntax
2. Removes unnecessary `apply_prototype` as it hasn't any effect when
   including contexts from the same (this) syntax definition.

## Rename some contexts

Many contexts pop, so naming all of them `-pop` is probably somewhat
overwhelming. Here's a suggestion which proved quite useful to express
popping contexts while keeping names clean.

1. Non-popping contexts use plural names (e.g. "tags")
2. Popping contexts use singular name (e.g. "tag-content")

Note: HTML.sublime-syntax unfortunately doesn't follow this naming scheme consequently for historical reasons and won't do so anytime soon
in order to not break existing 3rd party packages. Any new syntax should
probably do so though.

## Remove unsupported regexp patterns

Atomic groups and possessive quantifiers are unsupported by sregex
engine and may either have no effect or trigger slower Oniguruma.

Please: check if this has negative effects.

## Simplify tag-name contexts

Context switches are not needed and some patterns can be shared.

## Avoid duplicate tag name assignments

Avoiding capture groups can improve parsing performance. ST is quite
good with regards to re-using lookahead results, so they the overall result is at least equal or even slightly faster with regards to performance, while code looks somewhat cleaner.

## Rename variables

Some `tag_...` variables already exist in HTML.sublime-syntax and
HTML (Plain).sublime-syntax. This commit makes sure to not regress
something due to overriding some HTML related variables by accident.

Note: All new variables and contexts should be prefixed.

## Prepend heex tags to tag-other

Only low priority, but HEEx tags are basically foreign tags with
regards to HTML specs. As such those should be placed in `tag-other`
context.

It ensures any valid html tag is matched correctly under all
circumstances. It would even work if foreign tags are lower-case.

## Renamed some context names and appended "-pop" (Aziz)
## Re-use elixir interpolation from HEEx

This commit...

1. reduces duplicate context definitions
2. ensures same interpolation/embedded scopes are used across syntaxes

## Prepend surface tags to tag-other

Only low priority, but surface tags are basically foreign tags with
regards to HTML specs. As such those should be placed in `tag-other`
context.

It ensures any valid html tag is matched correctly under all
circumstances. It would even work if foreign tags are lower-case.

## Embedded surface comments

This commit...

1. adds named context for surface comment content as those help
   debugging syntaxes via ST's scope name popup.
2. removes `clear_scopes: 1` as it removed the main scope.

## Rename variables

Some `tag_...` variables already exist in HTML.sublime-syntax and
HTML (Plain).sublime-syntax. This commit makes sure to not regress
something due to overriding some HTML related variables by accident.

Note: All new variables and contexts should be prefixed.

## Add missing meta.tag scopes

Some color schemes rely on `meta.tag` to scope attribute values correctly.

## Rename some contexts

Many contexts pop, so naming all of them `-pop` is probably somewhat
overwhelming. Here's a suggestion which proved quite useful to express
popping contexts while keeping names clean.

1. Non-popping contexts use plural names (e.g. "tags")
2. Popping contexts use singular name (e.g. "tag-content")

Goal ist to follow the introduced scheme of HEEx

Note: HTML.sublime-syntax unfortunately doesn't follow this naming scheme consequently for historical reasons and won't do so anytime soon
in order to not break existing 3rd party packages. Any new syntax should
probably do so though.

## Reorganize Markdown/Raw tag contexts

This commit just moves markdown and raw contexts to group them with the
surface-other contexts.

The idea is the following structure:

1. extended HTML contexts
2. surface tag contexts
3. surface blocks and embedding contexts
4. eex interpolation

## Simplify tag-name contexts

Context switches are not needed and some patterns can be shared.

## Remove unsupported regexp patterns

Atomic groups and possessive quantifiers are unsupported by sregex
engine and may either have no effect or trigger slower Oniguruma.

Please: check if this has negative effects.

## Add named context for surface-blocks

## Simplify conditional block pattern

## Don't clear comment scope in interpolation

Follow HEEx implementation.

## Simplify html-custom-tags

This commit...

1. removes the group (?:) from `surface_tag_char` variable as it
   seems unnecessary to wrap a char-set this way.

2. adds a `html_custom_tag_char` custom variable which is a copy of
   `surface_tag_char` but excludes the period.

   This new variable allows to replace the groups from the patterns in
   the `html-custom-tags` context by a simple char-set pattern.

3. As the pattern in `html-custom-tags` matches every valid tag
   character, it is not required to push into `surface-begin-tag-name`
   context as it doesn't have to offer anything more. It just pops
   immediately.

4. `html-custom-tags` basically overrides `tag-other` from
   HTML.sublime-syntax in a more general way by not requiring a tag
   name to start with an ascii char. Thus `meta_prepend: true` can be
   removed from `tag-other`. The pattern from HTML wouldn't match
   anyway.

## Renamed some context names and appended "-pop" (Aziz)
@azizk
Copy link
Collaborator

azizk commented Oct 26, 2021

My most recent local copy of what I've pushed upstream doesn't contain tag-generic-attribute-value nor does it contain else-pop in it.

Ah, sorry, I didn't mention it. I force pushed to our deathaxe/proposals branch.

Does it apply to both HEEx and Surface?

Yes. So tag-generic-attribute-value-content isn't needed in either syntax file. I checked the source code of Surface and tested HEEx in the console to make sure. I wonder why I thought this was supported. Maybe earlier versions had a bug?

I managed to fix the else-pop issue. Just needed an elixir-embedded-pop context.

Another issue I encountered is that attributes like id, class etc. are handled specially. I just made a simple rule to check whether an attribute name is followed by a ={ to know whether it's an embed value or not. The disadvantage is that the attributes don't get the special scope names.

@princemaple
Copy link
Collaborator

FYI it's best not to force push a branch that multiple people are working on. Also prefer --force-with-lease so you avoid overwriting someone else's work.

If the purpose was to organize the commits, it's best done at the very end when you want to merge it in. And it's safer to create a new branch on top of it before you squash.

@azizk
Copy link
Collaborator

azizk commented Oct 26, 2021

I've always been using --force-with-lease. 👍

Yeah, the way I handled this here wasn't optimal. I should have pushed onto deathaxe's branch to keep the changes visible for some time.

@deathaxe
Copy link
Contributor Author

deathaxe commented Oct 27, 2021

But you make it impossible to track changes by this kind of abusing git. That's not the purpose on version control!

Please don't force push to my branch making it look broken by me!

@deathaxe
Copy link
Contributor Author

I just made a simple rule to check whether an attribute name is followed by a ={ to know whether it's an embed value or not.

Why bothering if embedding is not supported in attribute values?

@deathaxe
Copy link
Contributor Author

deathaxe commented Oct 27, 2021

I checked the source code of Surface and tested HEEx in the console to make sure. I wonder why I thought this was supported.

To be sure to understand correctly I just checked https://surface-ui.org/template_syntax.

The very first example shows embedded elixir code in html tag attributes. So why do you think it is not supported?

grafik

The following is status quo of this branch:

Surface:

grafik

grafik

  • no <% %> embedding in attributes
  • but { ... } is highlighted in attributes and content

HEEx:

grafik

grafik

  • no <% %> embedding in attributes
  • but { ... } is highlighted only in attributes

EEx:

grafik

  • <% %> embedding in attributes
  • { ... } not part of the syntax

deathaxe and others added 4 commits October 27, 2021 18:12
Expressions are borrowed from HEEx syntax tests to illustrate differences.
... working as expected with current branch.
Follow the modifications done to the HTML (...) syntax files.
@azizk
Copy link
Collaborator

azizk commented Oct 28, 2021

But you make it impossible to track changes by this kind of abusing git. That's not the purpose on version control!

I know, I know. It wasn't the best thing to do, but I don't think I abused it. I only ever worked on my branch. You can still compare changes by diffing your local branch with my remote branch, or by selecting the top commits in Sublime Merge.

Please don't force push to my branch making it look broken by me!

Never pushed let alone force pushed anything to your branch. I only did it on my branch because I thought we'd be finished soon and only a ff-merge would remain to be done. Next time I'll be more careful, create normal commits and discuss squashing beforehand when necessary. :)

Why bothering if embedding is not supported in attribute values?

Embedding is supported as values but not inside the strings. In my example above, the embedding is inside the string. That wasn't supported or isn't anymore.

<!-- Invalid: -->
<div {@some_attr}="value" style="{@value}" />
<!-- Valid: -->
<div {[class: "x"]} style={@style} />

@princemaple
Copy link
Collaborator

@azizk I think you pulled the branch from this PR and force pushed after you changed it. This branch is @deathaxe 's.

Hey, I feel a bit bad about it but I squashed most commits and re-based onto master. I consolidated (almost) all the commit messages. Hope that's alright!

You both are pointing to the same remote branch that @deathaxe created. You probably forgot about it since this PR has been running for a while now.

You are able to do that because you are a maintainer of the project.

@deathaxe
Copy link
Contributor Author

You can still compare changes by diffing your local branch with my remote branch

Of course, but it gets more and more complicated the more changes have been made. It gets even more difficult if the foreign branch was rebased on a nother branch which introduces many changes in the meanwhile.

It's better to just merge the master into the feature branch if required.

If the intermediate changes are of no longer use they can be squashed to one commit, when merging the PR to master at the end.

Embedding is supported as values but not inside the strings

Ahh ok, thanks

@deathaxe deathaxe changed the title HTML (EEx) Some proposals and fixes HTML (EEx/HEEx/Surface): Refactor syntaxes Oct 29, 2021
@azizk
Copy link
Collaborator

azizk commented Nov 2, 2021

Thanks for making all the changes and adding support for HEEx slots!

Do you think it would make sense to keep the meta scopes separate, e.g. havingmeta.embedded.surface and meta.embedded.heex instead of only meta.embedded.html everywhere? Would this be useful for future scripts when querying the document?

@deathaxe
Copy link
Contributor Author

deathaxe commented Nov 2, 2021

Not sure how important such a distinction really is. I just use meta.embedded and meta.interpolation to modify background slightly in my custom color scheme.

I gave arguments for meta.embedded.[heex|surface] vs. meta.embedded.html before, but I read from your posts you'd prefer the latter one. Thus I changed those scopes.

I followed the first approach with Ruby in Rails or Java Server Pages because it enabled me to re-use those contexts in embedded CSS and JS syntaxes. If we use meta.embedded.html we need to scope elixir code meta.embedded.[css|js] in script/style content. But that's not implemented at all and I don't know whether it is supported by HEEx/Surface interpreter.

@deathaxe
Copy link
Contributor Author

Just for the record: I am done with this PR and won't add any further commits.

@azizk
Copy link
Collaborator

azizk commented Nov 10, 2021

Thanks for all the effort, @deathaxe!

I want to finish this, but I couldn't pay full attention to it due to other things. Hope to close this soon. Next release has been long overdue. :)

@princemaple
Copy link
Collaborator

@deathaxe noted, thanks a lot!

@azizk
Copy link
Collaborator

azizk commented Dec 17, 2021

So I finally got around to working on this again and I finished it up. I think I integrated almost all of your suggestions, @deathaxe, except for these:

  • The move to *.html scopes from *.surface, *.heex, *.elixir for embed punctuation tokens and others. This didn't make any sense to me (also because the official syntax definitions didn't receive similar changes). Maybe there was a misunderstanding above, but my idea was to only add different meta scopes like meta.embedded.{surface|heex|eex} for the different flavours. I'm anything but an expert on scope names but this is what I've observed and understood.
  • Interpolations are not supported inside <!-- HTML {comments} -->.
  • The head structure of each syntax file. Maybe we can unify it later on, but atm I prefer it this way.
  • Probably some other minor stuff I'm failing to mention.

Thanks again a lot for your extensive help and your active contributions! New package release is coming. 🥳

@azizk azizk closed this Dec 17, 2021
@deathaxe deathaxe deleted the pr/proposals branch December 17, 2021 16:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants