Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce unvisitableExtensions to remove isHTML implementation #1230

Merged
merged 9 commits into from
Sep 12, 2024

Conversation

edwinv
Copy link
Contributor

@edwinv edwinv commented Mar 25, 2024

Fixes #608

When Turbo navigates to a new URL, locationIsVisitable is called to check if the location is visitable. One part of the check is checking if the requested URL returns HTML. We can never determine with 100% confidence if a request will return HTML. A page with a .html extension might still return another content-type. A page with a .foobar extension might return HTML. Using a check on the file extension is therefore imprecise and prone to bugs.

Turbo already handles non-HTML responses. When a response doesn't contain an HTML part, a contentTypeMismatch is raised. This causes the page to reload to the requested URL. So we could remove the isHTML check completely and still have a functioning situation. The major downside is having two requests: one Turbo fetch and one regular browser request.

A user could prevent these double requests by adding data-turbo="false" to these links. But there might be some sensible default by never handling known non-HTML extensions as a Turbo request. This PR introduces the Turbo. unvisitableExtensions setting to enable this.

By default, a range of common non-HTML extensions are included in this setting, like .png and .jpg. If users have many links to other extensions and don't want to sprinkle their HTML with data-turbo="false" attribute, they can add new extensions to the set:

Turbo. unvisitableExtensions.add(".mp3")

Because these changes are added to the lower-level checks, they work both for regulars visits and the newly introduced page refresh morphs.

Mentioning @packagethief @dhh @jorgemanrubia for some visibility.

@packagethief
Copy link
Contributor

packagethief commented Mar 25, 2024

I'm wary of using an exclusion list and assuming that most things are HTML by default. I think the list will just keep growing. But I do like the API you've proposed better than isHTML's existing regex. What would you think of using that API with an HTML extension allowlist? Then applications can more easily configure the extensions they want treated as HTML:

Turbo.htmlExtensions.add(".foobar")

The implementation would be as you've proposed, but would check for inclusion in the HTML extensions, rather than exclusion. We'd ship with the defaults already defined by the regex:

export const htmlExtensions = new Set([ ".html", ".htm", ".xhtml", ".php" ])

export function locationIsVisitable(location, rootLocation) {
  return isPrefixedBy(location, rootLocation) && htmlExtensions.has(getExtension(location))
}

EDIT: I guess that won't work when there's no extension – the implementation would need to account for that.

@edwinv
Copy link
Contributor Author

edwinv commented Mar 25, 2024

On the one hand finding the extension is always a educated guess. In our application we have URL's with a filter, containing a date range presentation. Example: /invoices/filter/period:202401..202403. This is a valid URL and is perfectly handled by Rails routing. Turbo thinks the extension is .202403. This is a dynamic part of the URL and can never be added to an htmlExtensions array on the frontend. Turbo thinks it is no HTML and will always reload the page. Up until recently this wasn't an issue: while navigating the user just got a fresh page. With Turbo streams refreshing/morphing the page, the user now faces a complete refresh while we are just trying to update a small part of it. This is degrading the user experience.

On the other hand, we need to think about fallback situations. If the treatAsNonHtml set is not covering all situations, the page will still load. It takes two requests, but the user won't notice. The major downside is wasting traffic and a bit of a delay for the user. Compare this to an incomplete htmlExtensions. Then the user will have a degraded experience due to full page reloads, while morphing is expected.

So, I'm still in favor of the proposed set for non-HTML extensions. I'm aware it is not covering all possible extensions, but when it does not cover an extension, the user won't notice and the experience stays the same.

I like the htmlExtensions name, so maybe we should rename treatAsNonHtml to a simpler name like nonHtmlExtensions or neverLoadWithTurbo.

@packagethief
Copy link
Contributor

So, I'm still in favor of the proposed set for non-HTML extensions. I'm aware it is not covering all possible extensions, but when it does not cover an extension, the user won't notice and the experience stays the same.

Thanks for the explanation. I hadn't considered dynamic paths like the one you described. I think your reasoning makes sense.

I think we'd need a much larger default exclusion list, however. Just looking at Basecamp and HEY, we have a pretty extensive list of extensions we'll auto-link to: zip, tar, gz, bz2, rar, 7z, dmg, exe, msi, pkg, deb, iso, bmp, mp4, mov, avi, mkv, wmv, heic, heif, mp3, wav, ogg, aac, wma, webm, ogv, mpg, mpeg. While these could return HTML, it's extremely unlikely, and we wouldn't want to start making two requests to resolve them.

@packagethief
Copy link
Contributor

Thinking about it more, I think there are more downsides to an exclusion list than the inclusion regex we have now. It makes more sense to define the extensions that we're confident are HTML than it does to define the ones that aren't. The former is going to be a much smaller set than the latter for all practical purposes.

The example with a pathname like /period:202401..202403 feels like an uncommon case, and one that could still be supported by augmenting the regex we have now. I'd be in favor of a nicer API that didn't involve replacing the regex though, like a way to define additional extension matchers.

@edwinv
Copy link
Contributor Author

edwinv commented Mar 25, 2024

For the inclusion list/regex to work, it should have perfect coverage. My situation is very specific, but there have been plenty of other examples like FQDN like example.com, Rails ids in paths, .deploy, file extensions, and usernames with a dot in it. I don't see how a default list/regex is going to cover all these situations.

If the list/regex is a configuration, a user could extend the list and cover their needs. But even then it is hard to get it covered. Many paths contain dynamic parts that are hard to contain in a list or regex.

But I'm also thinking about the developer experience. A new developer starting with Turbo expects Turbo to work all the time. When a part of the requests/refreshes are suddenly page reloads, it is not clear at first hand what is happening. This was my firsthand experience last few days when users complained about random refreshes and we couldn't find the cause because the . is only in specific situations in the URL.
Having a inclusion list/regex makes things more flexible for the developer, but also opens the door for new bugs. What if the regex matches all extensions by accident, then all of the sudden Turbo starts loading full PDF files or large binaries.

I don't see a downside for a long(er) exclusion list. We probably can find a reasonable list of say 100 most used file extensions on the web that give no HTML response. If we miss something, the end-user won't notice other than the double request. And the developer can safely extend the list by providing a specific extension.

What I was thinking about, is the introduction of the (first) Turbo configuration. Instead of going with the Turbo.someConfig route, we can also introduce a specific <meta> tag for the configuration. This is more in line with how other configurations for Turbo are made.

@brunoprietog
Copy link
Collaborator

Is it too complicated to add a slash at the end? In Rails you can easily do it with the trailing_slash option, at least. It would automatically solve your problem.

@edwinv
Copy link
Contributor Author

edwinv commented Apr 5, 2024

Is it too complicated to add a slash at the end? In Rails you can easily do it with the trailing_slash option, at least. It would automatically solve your problem.

Yes, this has all kinds of side effect in our application. Furthermore I believe a library like Turbo shouldn't dictate if slashes are added or not. Rails might have an easy solution for this, but isn't this library intended to be used by other frameworks too?

@edwinv
Copy link
Contributor Author

edwinv commented May 13, 2024

How are we going to move forward with this PR? I haven't read any arguments that have convinced me the exclusion list is not going to work, other than some minor inconveniences with maintaining the list. In my point of view this downside does not outweigh the downside of unexpected behavior with full page refreshes due to some random . being interpreted as a file extension. I still consider detecting file types by their extension to be a code smell.

@gjtorikian
Copy link

Given the original issue, and other solutions that have been proposed (e.g. #519), what are the requirements for a solution to fix this two+ year old issue? It seems to me that we shouldn’t let perfect be the enemy of good: yes, maintaining an exclusion list sucks, and maybe the list in this PR doesn’t cover every case. But there’s currently no clean way to implement what the PR is fixing.

I'd be in favor of a nicer API that didn't involve replacing the regex though, like a way to define additional extension matchers.

This was the last message on the topic—is your belief the same, @packagethief ? Because it sounds like the API/regex is the issue, not the overall idea. And if that is the issue, then I think that API can be improved in the future; but if you disagree, then maybe this PR can be amended to introduced functionality to make this configuration easier.

@packagethief
Copy link
Contributor

It seems to me that we shouldn’t let perfect be the enemy of good: yes, maintaining an exclusion list sucks

I agree with that. I've come around on the idea of an exclusion list. As long as the default is reasonably comprehensive (there's a longer list in #1230 (comment)) and there's an official way to configure it (which this PR includes), it should be less onerous than having to define your own permitted extensions.

If I may though, treatAsNonHtml feels a little opaque. This is really about whether a particular path should be visitable. With that in mind, I think a name like Turbo.unvisitableExtensions would be more intention-revealing.

@gjtorikian
Copy link

Sorry for the direct ping @edwinv — just wondering if you had time to tidy up the work here!

@edwinv
Copy link
Contributor Author

edwinv commented Aug 7, 2024

@gjtorikian I think the PR is finished, now we have agreed on the exclusion list. I've renamed the variable (thanks for the suggestion @packagethief) and included some more extensions to the list to cover more cases. Ready for a final review!

@edwinv edwinv changed the title Introduce treatAsNonHtml to remove isHTML implementation Introduce unvisitableExtensions to remove isHTML implementation Aug 7, 2024
@edwinv
Copy link
Contributor Author

edwinv commented Aug 7, 2024

CI fails on a test that does not fail locally with me, was not failing on previous commits and is unrelated to my changes. It seems like a random failure, but I can't retry it to verify this.

@gjtorikian
Copy link

Yeah, a quick search shows that the same test failed last month: #1169 (comment)

I reckon it’s flakey.

@gjtorikian
Copy link

With the cleanup made and the flakey test flaking, is this ready to be merged/released?

@packagethief
Copy link
Contributor

With the cleanup made and the flakey test flaking, is this ready to be merged/released?

Yes! Sorry for losing track. Looks like there's a small conflict to resolve.

@brunoprietog
Copy link
Collaborator

The conflict is caused by #1217, which means that this should be moved to Turbo.config.drive.unvisitableExtensions.

@edwinv
Copy link
Contributor Author

edwinv commented Sep 2, 2024

The conflict has been resolved in line with the #1217 changes.

@edwinv
Copy link
Contributor Author

edwinv commented Sep 2, 2024

I've identified an issue with the tests in #1217 due to the complete config being false instead of an object. Fixed it in my last commit. This is unrelated to my changes, but caused exceptions in the tests.

On a side note: it took quite some time to find the issue due to Playwright not failing in JS error that occur in the browser. I'm not very familiar with Playwright, but wouldn't it be better to catch all errors and console logs and raise them in the tests? Otherwise some exceptions might not be noticed because we just ignore them.

@edwinv
Copy link
Contributor Author

edwinv commented Sep 12, 2024

All checks are green, ready to merge I guess? I prefer to be ahead of potential new conflicts due to other changes being merged.

Copy link
Contributor

@packagethief packagethief left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @edwinv. Just a couple of style tweaks and I think we're ready to merge. Thanks for sticking with it 🙏

src/tests/functional/visit_tests.js Outdated Show resolved Hide resolved
src/tests/functional/visit_tests.js Outdated Show resolved Hide resolved
src/tests/functional/visit_tests.js Outdated Show resolved Hide resolved
@packagethief packagethief merged commit b8b6662 into hotwired:main Sep 12, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

Turbo will not navigate to URLs that have a . in the pathname
4 participants