Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use buffer pool to avoid large buffer allocation for connections and responses #87

Open
wants to merge 7 commits into
base: branch-2.6
Choose a base branch
from

Conversation

mfp
Copy link
Contributor

@mfp mfp commented Aug 16, 2015

A trivial buffer pool is used to get rid of these buffer allocations:

  • per connection/receiver buffers (Ocsigen_http_com)
  • internal buffer in Ocsigen_senders.File_content
  • internal buffer in deflatemod

Moreover, some care is taken to avoid allocation for the last chunk of a file when the remaining data doesn't match the buffer size.

The changes are minimal, but reviews are appreciated. In particular:

  • I haven't tested deflatemod yet
  • I made a mistake in my first attempt at addressing the last-chunk issue (forgot that the buffer size could be rounded up to the next power of two), so any extra eyeballs to look for errors would help
  • I had to release the response stream resources (i.e. buffers) in Ocsigen_server; it's literally 3 lines, but it's quite big conceptually

Overall, these changes deliver up to ~25-30% more speed in some requests.

Performance

Effect of buffer pools

(buffer pool disabled with OCSIGEN_DISABLE_BUFFER_POOL -> enabled)
o=150 is the GC setting used in half of the runs

404
        12000 -> 12800
 o=150  12900 -> 13500


normal response (20 bytes)
        ~ 8100 -> 8900
  o=150 ~ 8400 -> 8900


16kb file ->
          6721 -> 7800
   o=150  7100 -> 7900

Effect of last-chunk optimization when serving files

PRE                                       POST
----------------------                    ----------------------
15kb file ->                              15kb file ->
          6200 -> 7400                              6400 -> 7950
   o=150  7000 -> 7800                       o=150  7200 -> 8200

7kb file ->                               7kb file ->
          7500 -> 8550                              7450 -> 9150
   o=150  7650 -> 8700                       o=150  7850 -> 9250


1kb file ->                               1kb file ->
          8100 -> 8750                              8100 -> 8750
   o=150  8400 -> 8800                       o=150  8550 -> 9000

----------------------                    ----------------------

Overall effect of Lwt_chan fix #49 + buffer pools + last-chunk optimization

(only for files): compare figures at the top + those in POST to these for 2.6:

2.6
--------------

404                                    16kb file ->
         11800                                   5500
 o=150   12500                            o=150  6100

20-byte response                       15kb file ->  
          7450                                   5200
   o=150  7900                            o=150  5900

                                       7kb file ->   
                                                 6000
                                          o=150  6700


                                       1kb file ->   
                                                 7650
                                          o=150  7950

@mfp
Copy link
Contributor Author

mfp commented Aug 16, 2015

AFAICS it builds & installs fine w/ jenkins (certainly builds for me). Same (phony?) uninstall error as in #84.

@vasilisp
Copy link
Contributor

Thanks for doing this. I haven't read the last-chunk part carefully, but the rest of the code looks OK to me. The overhead (hashtable, stacks, release closures) should be insignificant for buffers of size 1K or more.

I can install and uninstall (the buffer-pool branch) just fine locally, it is hard to say why rm fails inside the docker container.

@mfp
Copy link
Contributor Author

mfp commented Aug 17, 2015

On Mon, Aug 17, 2015 at 05:38:28AM -0700, Vasilis Papavasileiou wrote:

Thanks for doing this. I haven't read the last-chunk part carefully, but the
rest of the code looks OK to me.

Thank you for taking a look :)

The overhead (hashtable, stacks, release closures) should be insignificant
for buffers of size 1K or more.

Indeed. In this case there's a speed increase across the board for any
response size (even 404s and 20-byte responses become a bit faster) --
ocsigenserver cannot know a priori that a connection will not make full use of
the connection buffers, so it's got to use large ones.

For hashtables of such size (a few dozen elements at most) I've measured over
4e7 lookups/s (we only need a handful of them per request, so it's over 3
orders of magnitude away :). The closures are essentially free (with Lwt,
we're building much heavier ones all the time anyway), and the only thing I
didn't know how to quantify easily was the caml_modify inherent in
Stack.push, but it clearly involves less work than the mark&sweeping we'd
have if we were allocating 4KB buffers left and right.

I can install and uninstall (the buffer-pool branch) just fine locally, it
is hard to say why rm fails inside the docker container.

I'm told it's an issue in the build script used for PRs.

Mauricio Fernández

@vasilisp vasilisp mentioned this pull request Nov 8, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants