Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

URL normalization #15

Open
stevenvachon opened this issue Mar 3, 2021 · 4 comments
Open

URL normalization #15

stevenvachon opened this issue Mar 3, 2021 · 4 comments

Comments

@stevenvachon
Copy link

stevenvachon commented Mar 3, 2021

These two URLs are most likely to be the same:

  • http://host?var2&var1
  • http://host?var1&var2

However, this library will cache both separately, which is a waste of resources.

There are many other examples of when URLs can be similar or exact, and situations where the server is not trusted and should be cached separately. I've been writing a library that accurately normalizes and provides "common" (trusted) and "careful" (untrusted) profiles.

Is this a feature you'd consider implementing?

@jpodwys
Copy link
Owner

jpodwys commented Apr 25, 2021

Thank you for this post and for your patience! I agree that the two URLs you mentioned above should result in the same cache key. We may be able to solve this specific scenario without adding a dependency by making two changes:

  1. Update the keygen function in utils.js to exclude all query params from the url property.
  2. Use JSON.stringify's second argument called replacer to ensure the resulting string lists object keys in a deterministic order.

I'll be able to check whether this works soon.

@jpodwys
Copy link
Owner

jpodwys commented Apr 25, 2021

For the second item in my previous comment, it looks like using this function as the replacer argument will achieve what we want in all browsers. So I believe this approach will ensure that unique query/header order will not result in unique cache keys.

@stevenvachon
Copy link
Author

stevenvachon commented Apr 25, 2021

Query parameters are part of what constitutes a unique URL, though.

@jpodwys
Copy link
Owner

jpodwys commented Apr 26, 2021

The keygen function creates an object containing the following properties

  • request verb
  • URL
  • headers
  • query params

The headers and query params are stored as objects then the whole object is stringified.

My proposal here is to remove the query params from the URL and then coerce the keys in the headers and query params objects into a deterministic order. So even though I mentioned removing query params from the URL, that data would still be present in generated key.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants