Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for precompressed (for example gzip) content #108

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

gmokki
Copy link

@gmokki gmokki commented Apr 11, 2016

Disabled by default (should it be enabled by default?).
There should be no overhead if disabled.
If opts.precompressed is set to true, tries to serve both .br and .gz files, but custom configuration (encoding to extension mappings) is also supported

If enabled then for every file that is served we look up each configured file extension (even if browser does not request compressed variants) and if any are found we set the Vary header. This is required for caching proxies that the clients my be using to work correctly.

If matching file between Accept-Encoding and enabled precompression formats is found then the compressed file contents are served with Content-Encoding header. If multiple supported matches are found the first configured in the preferences will be used.

@dougwilson dougwilson added the pr label May 24, 2016
@dougwilson
Copy link
Contributor

Thanks, @gmokki! I have a question regarding the added functionality:

If I were to enable precompressed with this code and I have a folder full of *.tar.gz files, how can I prevent the clients from accessing the files as *.tar (as this will cause the client to automatically ungzip potentially giant tarballs)?

@gmokki
Copy link
Author

gmokki commented May 24, 2016

I do not think there can be anything to prevent that (at least with this patch). It works the same way as apache/nginx etc when configured with support for precompressed resources.
I'd say the main defence is that one does not link to .tar files, but directly to .tar.gz files. If you directly refer to the .gz file the patch does not set the content-encoding headers and thus does not cause the client to decompress the content.

@gmokki
Copy link
Author

gmokki commented May 27, 2016

More concrete answer:
Assuming you have files file.tar and file.tar.gz in folder where the uncompressed file.tar.gz contents matchs exactly to to the file.tar.

Before this patch:

  • When requesting file.tar.gz you get the exact (compressed) file.
    • the Accept-Encoding header with or without gzip has no effect
  • When requesting file.tar you get the exact (uncompressed) file.
    • the Accept-Encoding header with or without gzip has no effect

After this patch:

  • When requesting file.tar.gz you get the exact (compressed) file.
    • the Accept-Encoding header with or without gzip has no effect
  • When requesting file.tar
    • without Accept-Encoding: gzip header you get the exact (uncompressed) file
    • with Accept-Encoding: gzip header you get the tar.gz file with Content-Encoding: gzip response
      • the only difference is that the transferred wire data is smaller and the http client transparently decompresses to the file.tar as requested

For applications using the precompressed feature they just need to make sure that the contents of the uncompressed and compressed files match. As long as that holds no difference can be observed by the compilant http client users.

@dougwilson
Copy link
Contributor

Ah, @gmokki, gotcha. Yea, I looked at the nginx source and what they do is just unconditionally add .gz to the file name when the client wants to get the gzip encoding, and there are no other real checks. I think this is probably OK, but I think the PR needs a lot more documentation fleshed out.

@dougwilson dougwilson self-assigned this May 27, 2016
@gmokki
Copy link
Author

gmokki commented May 30, 2016

Can you give a hint of what kind of extra documentation in addition to the short README.md documentation you would like to have.

@dougwilson
Copy link
Contributor

Hi @gmokki, currently the README change in this pull request is only the following:

Enable or disable serving of precompressed content, defaults to false.
If set to true checks for brotli and gzip. An example value to serve
other formats: [{encoding: 'bzip2', extension: '.bz2'}]

There are a lot of questions that raises, including how to even use it. What values can I use for encoding? What about extension? How does this module determine which file to serve? How do I layout the files on the disk? How can we avoid questions like the one I had above?

I'm not asking you these questions, they are just what comes to mind when reading the README. We'll need enough information in the README for someone to use the feature. Consider that you may take it for granted what "precompressed" means and how to lay it out, but many people have never done that before.

@gmokki
Copy link
Author

gmokki commented May 31, 2016

I added the details you requested into the documentation with much longer clarifications. I also added a few generic tips on offline compression.

index.js Outdated
var header = this.req.headers['accept-encoding']
if (header) {
this._precompressionFormats.forEach(function (format) {
if (header.indexOf(format.encoding) >= 0) accepted.push(format.extension)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this is broken for web browsers that send "no-gzip, deflate", which was common for a time (Apache, and I assume others, understand no-gzip).

This also does not seem to take into account the quality parameters at all, which is very common from CDNs, as they usually want a specific encoding to store.

@dougwilson
Copy link
Contributor

Thanks! I just realized I still have not actually looked at the implementation, so doing a little bit now.

index.js Outdated
}

function sendPreferredContent(p, stat, contents) {
if (contents.length) self.res.setHeader('Vary', 'Accept-Encoding')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will overwrite any other Vary already set. Let's append this to the field name if already set instead of overwriting.

@hacksparrow
Copy link
Member

This is a good feature to have. @gmokki can you can address the issues pointed out by @dougwilson?

@gmokki
Copy link
Author

gmokki commented Jun 1, 2016

Yes, I will definitely try to address all pointed issues in next few days. Thank you for the through review and comments.

@dougwilson
Copy link
Contributor

Thanks, @gmokki! It sounds like this won't make it for the Express.js 4.14 release cut tonight, then. It'll probably be a few months before the next minor, and this module is locked to Express.js release schedules, so you'll have plenty of time to address!

@gmokki
Copy link
Author

gmokki commented Jun 1, 2016

I just fixed all the issues that @dougwilson pointed out.

I have already added a encoding quality parsing function to tomcat (https://github.com/apache/tomcat/blob/trunk/java/org/apache/catalina/servlets/DefaultServlet.java#L1096) - I'll try to add it now to here too.
But if that feature does not make it in time would it be possible to merge this without the encoding quality parsing? None of the common browsers send quality values. Actually so far all web servers that I have patched with brotli support just assumed that gzip;q=0 meant that gzip was supported.

@dougwilson
Copy link
Contributor

Thanks, @gmokki! It looks like there is still the outstanding stat issue (unless you fixed it in one of the commits?). The quality issue is a real issue, at least from CDNs. I maintain the compression module, which does similar negotiation, and I definitely know people have had real issues with this, and since this serves static files, which are pretty much the main thing to put behind a CDN, then it is an issue here as well, and I would rather get this done right than trying to rush this & make mistakes.

@hacksparrow
Copy link
Member

+1 for not rushing it.

@gmokki
Copy link
Author

gmokki commented Jun 1, 2016

I just added code for the accept-encoding quality handling.

I just added comment on why the stats are needed always when precompressed option is enabled. If that were not the case the precompression option could default to on.

I think I have now really addressed all comments. Do provide more feedback on bugs/missing features and I'll try to address them too.

@dougwilson
Copy link
Contributor

Thanks, @gmokki! In order to get the 4.14 release ready for tonight and our TC meeting, I cannot look any more at this PR for now. I'll take another look tomorrow after 4.14 for sure. There is still the comment regarding the .br file framing vs the stream if you can take a look into that as well!

@dougwilson
Copy link
Contributor

Also, the CI tests are still taking their sweet time to run (they still haven't gotten to your pushes from earlier today), so we have to at least wait on those to consider merging.

Also, real quick, I do have a couple more thoughts on the header stuff: have you considered using the modules we have already owned by the Express TC to do some of that heavy lifting? Did they not work out (and would it be possible for us to modify them so they can be used here)? Most of them can be found in https://github.com/jshttp (our organizations are https://github.com/expressjs , https://github.com/pillarjs and https://github.com/jshttp).

@gmokki
Copy link
Author

gmokki commented Jun 1, 2016

I'm not an authority on the brotli file formats nor their official extension.
For the extension I can say that all other web servers I know have the brotli static file serving support (ngingx, tomcat, jetty) and apache examples all refer to extension being .br.
The file format is outside of the scope of this module. It is the users responsibility to make sure that file ending with .gz/.br can be decompressed by the browser.

I originally tried to avoid extra dependencies. But I could use https://github.com/jshttp/vary to modify the vary header and https://github.com/jshttp/negotiator to parse the preferred accept-encoding header. I'll try to modify the code to use them tomorrow. Now that the code has tests for both cases taking them into use should be simple.

@dougwilson
Copy link
Contributor

For the extension I can say that all other web servers I know have the brotli static file serving support (ngingx, tomcat, jetty) and apache examples all refer to extension being .br.

That doesn't make it the correct thing to do. For example, I could have just accepted this and been in the same boat. Just because others do something doesn't automatically make it correct. In order to accept this PR as-is, with .br being check by default, it has to make sense, and as such, I need you to perform the investigation I asked about the framing, otherwise it will have to wait until I have time to do that depth of digging, which I will do in the order I have received PRs to be fair to all other people who have open PRs. This PR is currently #82 in the line...

@gmokki
Copy link
Author

gmokki commented Jun 2, 2016

@dougwilson: I think you are correct that adding explicit brotli support is premature. The file format is not yet finalized. I just pushed a new version that removes brotli from the default set of formats and updates the documentation to match.

I also tried using the https://github.com/jshttp/accepts for choosing the best encoding format. Unfortunately that does not yet work because the module does not honor the server preference order for values that have equal quality level in the client request.

@dougwilson
Copy link
Contributor

I also tried using the https://github.com/jshttp/accepts for choosing the best encoding format. Unfortunately that does not yet work because the module does not honor the server preference order for values that have equal quality level in the client request.

Is it possible to get that added? I think I remember seeing that wasn't in there when dealing with compression issues, so it would be nice if we can get that there to keep us from continuing to punt on the issue :)

@gmokki gmokki changed the title Add support for precompressed (gzip and brotli) content Add support for precompressed (for example gzip) content Jun 27, 2016
@gmokki
Copy link
Author

gmokki commented Sep 16, 2016

My negotiator pull request jshttp/negotiator#49 is still not merged, but I updated this pull request to use it. I also did some worst case benchmarking: when enabling the precompressed check with one extra stat the performance goes down at most by 20% when serving 1 byte files to localhost with single infinite depth http pipeline connection. I doubt that it is observable in actual use cases.

gmokki pushed a commit to NitorCreations/digitransit-ui that referenced this pull request Nov 23, 2016
…the browsers.

- optimize png images with pngquant (quality 95) and advpng
- optimize svg images with svgo
- create precompressed variants of js, css, svg files: gz using zopfli and br using brotli

Currently using forked versions of expressjs/send/negotiator while waiting for the PRs to go through:
- jshttp/negotiator#49
- pillarjs/send#108
gmokki pushed a commit to NitorCreations/digitransit-ui that referenced this pull request Nov 23, 2016
…the browsers.

- optimize png images with pngquant (quality 95) and advpng
- optimize svg images with svgo
- create precompressed variants of js, css, svg files: gz using zopfli and br using brotli

Currently using forked versions of expressjs/send/negotiator while waiting for the PRs to go through:
- jshttp/negotiator#49
- pillarjs/send#108
gmokki pushed a commit to NitorCreations/digitransit-ui that referenced this pull request Nov 25, 2016
…the browsers.

- create precompressed variants of js, css, svg files: gz using zopfli and br using brotli

Currently using forked versions of expressjs/send/negotiator while waiting for the PRs to go through:
- jshttp/negotiator#49
- pillarjs/send#108
gmokki pushed a commit to NitorCreations/digitransit-ui that referenced this pull request Nov 25, 2016
…the browsers.

- create precompressed variants of js, css, svg files: gz using zopfli and br using brotli

Currently using forked versions of expressjs/send/negotiator while waiting for the PRs to go through:
- jshttp/negotiator#49
- pillarjs/send#108
@gmokki
Copy link
Author

gmokki commented Apr 10, 2017

Rebased to fix merge conflicts. Unfortunately the negotatior PR is still pending.

@DylanPiercey
Copy link

Is there anything that could be done to aid in landing this PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants