Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extra characters being inserted before and after HTTP response when X-Sendfile is used and there are many concurrent requests #218

Open
jhpyle opened this issue Aug 27, 2017 · 19 comments

Comments

@jhpyle
Copy link

jhpyle commented Aug 27, 2017

When I upgrade from mod_wsgi 4.3.0 (i.e. Debian jessie with Apache 2.4.10) to mod_wsgi 4.5.11 (i.e., Debian stretch or Ubuntu zesty with Apache 2.4.25), I start to see the following scenario:

When many concurrent requests are made at the same time, some static files that are served through X-Sendfile are 1) delayed by more than 5 seconds; 2) corrupted with ten bytes before the HTTP response and ten bytes after the HTTP response. This ten byte sequence consists of one ASCII character 3 byte followed by nine ASCII character 0 bytes. The sequence before the response appears immediately before the HTTP header, causing the browser to be unable to read the status code or content type of the HTTP response. The content of the response is there, and is correct (that is, the contents of the file referenced by X-Sendfile are there, and the content-type header is set correctly, etc.). The problem is that the response is sandwiched between these ten-byte sequences.

I am seeing this happen when a browser initially accesses my web application. Since the browser has not cached anything yet, approximately 20 CSS and Javascript files are all requested all at the same time. One or two of my CSS or Javascript files will get delayed and corrupted. The other files are delivered properly within 1,300 milliseconds. After a page refresh, the corrupted files will load without a problem (during that page load, only a few concurrent requests are made because the rest of the static files are cached by the browser). Sometimes, no files get corrupted. Usually one is corrupted, but sometimes two. There is some randomness to it.

The five second delay happens during the "receiving" portion of the request, according to my browser's network timing meters.

The browser is talking directly to Apache over HTTPS. Apache is running on a machine with 1 core.

My Apache configuration looks like this:

        WSGIDaemonProcess docassemble.webserver user=www-data group=www-data threads=5
        WSGIScriptAlias / /usr/share/docassemble/webapp/docassemble.wsgi
        <Directory /usr/share/docassemble/webapp>
            WSGIProcessGroup docassemble.webserver
            WSGIApplicationGroup %{GLOBAL}
            AllowOverride none
            Require all granted
        </Directory>

I am using Flask. If I turn off the X-Sendfile feature in my Flask application, this problem goes away. The problem seems to be related to the empty-data nature of X-Sendfile responses.

I am using Flask in a virtualenv, so the Python packages (Flask, werkzeug, etc.) are the same both on the Debian jessie platform where I get no error and the Debian stretch platform where I see the error. I am using Python 2.7.

I am guessing the problem is not with the xsendfile Apache module because the version of this module did not change between Debian jessie, where I had no error, and Debian stretch, where the error appeared.

The error also happens on Debian sid, which has even more up-to-date versions of Apache, etc.

It could be that this is an Apache issue and not a mod_wsgi issue, but I thought it made sense to check with you first. I noticed that wsgi_thread.c was doing memset(content, '\0', sizeof(content)) which made me think that maybe this was the source of those extraneous null bytes. And I looked on the internet for other people reporting this ten-byte corruption issue, and I couldn't find any other reports, which makes me think it is less likely to be a global issue with Apache.

I can help you reproduce this if you think it might be a mod_wsgi issue. I can be reached at [email protected].

Thanks very much,

Jonathan Pyle

@GrahamDumpleton
Copy link
Owner

When you upgraded, did you recompile mod_xsendfile, or update it to the latest corresponding version. Usually an Apache module compiled for an older Apache version should work on a newer version of same major/minor version, but there are some instances where I have seen this not be the case because Linux distros backport patches to Apache, breaking its API forward compatibility. If this has happened, it is important that the Apache module be recompiled for the newer version of Apache.

Also, what do you have EnableSendfile set to.

and are files on a local filesystem or NFS server. The sendfile() call can misbehave on some systems and is usually turned Off by default for that reason.

Finally, that memset line you reference is only initialising a stack based array and hard to see how that would be related.

@jhpyle
Copy link
Author

jhpyle commented Aug 28, 2017

By "upgrading" I meant that I went from using a Debian "jessie" virtual machine on Amazon Web Services to using a Debian "stretch" virtual machine on Amazon Web Services.

Debian "stretch" uses this version of mod_wsgi, which is in a package maintained by Bernd Zeimetz. Based on the changelogs it looks like the Apache module was compiled on 12/29/2016 against Apache version 2.4.25. This is the same version of Apache currently used by Debian "stretch." However, it looks like the Apache installation has been modified since then, including the backporting of some security fixes from 2.4.26.

I will try it again with a fresh recompile of mod_wsgi (and mod_xsendfile) to see if that changes things.

EnableSendFile is not set, so the default setting of "off" would be used.

In my application, the files referenced in the X-Sendfile headers are all on the local filesystem.

Thanks very much for your help!

@jhpyle
Copy link
Author

jhpyle commented Aug 28, 2017

I recompiled the wsgi and xsendfile modules against the Apache sources, and the 5-second delay and corruption with ten-byte sequences still occurs.

I'm not sure what the source of the 5 seconds is. I tried changing the the sleep(5)s in mod_wsgi.c, and tried changing shutdown-timeout, but the delay remained at 5 seconds. I changed LogLevel to info but nothing in the logs suggests that anything has gone wrong.

I wouldn't think that mod_wsgi had anything to do with this, but for the fact that the problem only happens when there are a lot of requests coming in at the same time. Maybe the problem is that the xsendfile module hasn't been updated in seven years...

@GrahamDumpleton
Copy link
Owner

Have you changed the value of Timeout directive in Apache configuration?

@jhpyle
Copy link
Author

jhpyle commented Aug 28, 2017

I have not changed Timeout, so it should still be at its default of 60 seconds, although I just tried changing KeepAliveTimeout to 10 seconds and now the delay on the HTTP response is a little more than 10 seconds.

The KeepAlive mechanism might be the issue here. (See discussion of this issue regarding nginx.)

@GrahamDumpleton
Copy link
Owner

Obvious thing to try then is to disable keep alive altogether.

Which MPM are you using? If using the event MPM and the new way it handles keep alive connections, then maybe mod_xsendfile is incompatible with it.

@jhpyle
Copy link
Author

jhpyle commented Aug 29, 2017

Yes, disabling KeepAlive makes the problem go away (but at a huge cost to performance). Another way to make the problem go away is to use HTTP instead of HTTPS.

In both Debian jessie (Apache 2.4.10), which had no problems, and Debian stretch (Apache 2.4.25), the event MPM is used. I don't see any major changes to keepalive functionality mentioned in the changelog between 2.4.10 and 2.4.25, but I am not an expert in this area.

I did an experiment to try to rule out mod_wsgi. I wrote a Perl script "file server" that prints an X-Sendfile header in order to serve files from my Flask static file directory. I also added a 250ms delay in the script. I then wrote an HTML file that includes the long list of static Javascript and CSS files that my web application calls, but it retrieves them through the Perl script. I enabled the cgid module and edited the Apache configuration to activate the Perl script in /usr/lib/cgi-bin. When I go to this HTML page in my web browser, the browser makes lots of simultaneous requests to the web server, just as it does when it is communicating with my web app through mod_wsgi. Interestingly, though, none of the responses gets corrupted or delayed. One difference between cgid and mod_wsgi is that cgid spawns separate processes for each call to the Perl script, whereas mod_wsgi (as I have it configured) uses one process with five threads.

@GrahamDumpleton
Copy link
Owner

Is there any chance you could pull down mod_wsgi source code for version 4.3.0 and see if it still has the problem? If it doesn't, then try 4.3.1.

@GrahamDumpleton
Copy link
Owner

GrahamDumpleton commented Aug 29, 2017

Also, try setting response-socket-timeout option on WSGIDaemonProcess directive to different integer values (seconds) and see if it changes how long things block up for.

@jhpyle
Copy link
Author

jhpyle commented Aug 29, 2017

Thanks so much for your help with this.

When I compiled and installed 4.3.0 (using the standard ./configure, make, sudo make install) the problem went away.

When I did the same with 4.3.1, the problem reappeared.

I tried setting response-socket-timeout on my WSGIDaemonProcess line but Apache wouldn't start; it gave the error Invalid option to WSGI daemon process definition. I tried setting socket-timeout to 10, but that did not have an effect on the timing of the 5 second delay.

@GrahamDumpleton
Copy link
Owner

Okay, seems I only added response-socket-timeout in 4.5.13. Before that it would adopt the value of socket-timeout.

Anyway, at least I know what code has likely introduced the issue. I need to now work out why, but more likely work out how to identify the X-Sendfile case and not send the response through my modified response output buffering pipeline.

@GrahamDumpleton
Copy link
Owner

Can you set:

EnableMMAP Off

in the Apache configuration and tell me if that avoids the problem?

@jhpyle
Copy link
Author

jhpyle commented Aug 30, 2017

Ok, I tried adding that line within the VirtualHost configuration. It did not avoid the problem.

@GrahamDumpleton
Copy link
Owner

Can you tell me if the mod_xsendfile source code is the same as what is at:

It mentions Apache 2.2, and looks like it may not have been changed for a very long time.

@jhpyle
Copy link
Author

jhpyle commented Aug 30, 2017

Yes, the source code being used by Debian for the xsendfile module is exactly the same as the source code on that site.

@GrahamDumpleton
Copy link
Owner

The static web page you said you created earlier to try and emulate this with a backend in Perl, can you provide me a version of that static HTML, even if uses dummy static asset URLs, and a Flask app with the most minimal code required to handle X-Sendfile. If you already have a self contained minimal example that triggers it, even better.

I want to try and create a setup to emulate the problem. I suspect I am going to have to build it into a Docker image though, as more than likely will not see it on MacOS X .

@jhpyle
Copy link
Author

jhpyle commented Aug 30, 2017

Ok, a minimal example is here:

https://github.com/jhpyle/testxsendfile

Note that I have only been able to trigger the problem over HTTPS. (I used to be able to get HTTPS to work on a personal computer with a self-signed certificate, but I can't figure that out anymore so these days I just use Let's Encrypt on virtual machines in the cloud.)

I have this running at https://test54.docassemble.org.

It is easier to trigger the problem when the network connection is slower. Over slow WiFi, I get the problem every time, but on a desktop I have to press Ctrl-Shift-R in Firefox several times before I see the problem reflected in the Console. You might be able to simulate this "slow connection" factor by changing the time.sleep() line in the Flask app.

@GrahamDumpleton
Copy link
Owner

Just to let you know, have had super busy week as need to get some stuff done before some trips. So haven't had a chance to look at it yet. Getting mod_xsendfile to compile on MacOS X is also a pain as MacOS X is broken and doesn't supply apr-config/apu-config scripts any more, thus apxs breaks if try and use it to compile modules.

@jhpyle
Copy link
Author

jhpyle commented Sep 5, 2017

Thanks very much for looking into the issue. By the way, I took down the https://test54.docassemble.org site last night (to save money on my Amazon Web Services bill), but I can recreate it if that would be helpful to you. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants